Re: Insert performance metrics

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Insert performance metrics

Aaron Burghardt
Hi,

OK, I am impressed that you can insert 2000 records/second on indexed  
columns.  I have an application that inserts CSV data into an SQLite  
database, and inserting 15 million records is taking about 2 hours  
with no indices (this is on a PowerMac Dual G5 2.0 GHz, 1.5 GB RAM),  
though two fields each require a simple sub-select on to populate.  
My attempts to improve performance include:
        * performing all the inserts inside one transaction
        * using prepared statements
        * dropping all the indices prior to running
        * setting sync mode to Normal
        * setting cache_size to huge values, like 300,000
        * creating temp tables in memory

Do my times sound reasonable or should I continue to look for  
optimizations?  Any other suggestions?

Thanks,

Aaron

On Sep 2, 2005, at 3:05 PM, D. Richard Hipp wrote:

> On Fri, 2005-09-02 at 11:59 +0200, Michael Schoen wrote:
>
>> (1)     Multiple Insert Statements
>> We need to insert around 300-500 datasets/sec constantly (24/7)  
>> with 8
>> till 16 fields indexed. So far we are using mysql, not only due to  
>> the
>> general dbms speed, but mainly because it has a csv import  
>> interface. We
>> figured out, that we can not insert that many data in row, when using
>> SQL statements, 'cause the SQL-parser just takes to much time...  
>> If we
>> use the CSV import methods, we gain at least a 30% performance boost.
>> While reading through the sqlite wiki, I found no evidence about any
>> multiple insert statement to decrease the SQL-Parsing time...
>> What would you guys propose?
>>
>
> 1.  Call sqlite3_prepare() to parse a generic INSERT statement.
> 2.  Call sqlite3_bind() to bind values to the generic statement.
> 3.  Call sqlite3_step() to do the INSERT
> 4.  Call sqlite3_reset() to reset the statement so that it can
>     run again.
> 5.  Go back and repeat steps 2-4 as many times as you like.
>
> This allows you to insert many different rows while only running
> the parser onces.  You do get about a 30% speed boost doing this.
>
> Out of curiosity, I created a table with 17 columns and 8 indices
> just to see how fast I could insert into it using SQLite.  On
> my laptop, I could easily do 2000 inserts/second (all in a single
> transaction) even when parsing each INSERT separately.  I would
> expect better performance on a workstation and a 30% or better
> performance improvement using the technique described above.
>
> On the other hand,  performance will fall off logarithmically
> as the size of the database increases.
>
>>
>> (2)     Indexing:
>> What kind of index implementations are you guys supporting/using?
>> B-Tree, R-Tree, Full-Text? What about Multiple-Column Indexes?
>> Supported?
>>
>
> B-tree indices. You can have as many columns in an index as
> you want.  But only the first 31 columns will be used to
> help speed searches.  (On the other hand, who ever creates
> an index with 32 or more columns?)
>


Reply | Threaded
Open this post in threaded view
|

Re: Insert performance metrics

D. Richard Hipp
Aaron Burghardt <[hidden email]> wrote:

> Hi,
>
> OK, I am impressed that you can insert 2000 records/second on indexed  
> columns.  I have an application that inserts CSV data into an SQLite  
> database, and inserting 15 million records is taking about 2 hours  
> with no indices (this is on a PowerMac Dual G5 2.0 GHz, 1.5 GB RAM),  
> though two fields each require a simple sub-select on to populate.  
> My attempts to improve performance include:
> * performing all the inserts inside one transaction
> * using prepared statements
> * dropping all the indices prior to running
> * setting sync mode to Normal
> * setting cache_size to huge values, like 300,000
> * creating temp tables in memory
>
> Do my times sound reasonable or should I continue to look for  
> optimizations?  Any other suggestions?
>

You should be getting 10x better performance.  Perhaps your
subselect is slowing you down.
--
D. Richard Hipp <[hidden email]>