Bulk Insert in Sqlite3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Bulk Insert in Sqlite3

Niti Agarwal
Hi,
We are using Sqlite3 with Golang to do bulk insert.
Need to insert close to 10 Million records to sqlite3 in around 30 mins.

Currently I am saving 100 Records under one transaction with below settings:

           PRAGMA synchronous = NORMAL;
   PRAGMA journal_mode = WAL;
   PRAGMA auto_vacuum = FULL;

I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?

Also, read about sql bind feature, but not very sure how to do in Golang?
Is there any better way to do bulk insert?

Thanks,
Niti
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bulk Insert in Sqlite3

Simon Slavin-3

On 7 Feb 2017, at 5:36am, Niti Agarwal <[hidden email]> wrote:

> Need to insert close to 10 Million records to sqlite3 in around 30 mins.

This number of records requires so much space the temporary data will not fit inside a cache.  Consider using a counter so that the transaction is ended and a new one begun every 1000 records.  Or perhaps every 10000 records.  Try both ways and see which is faster.

> Currently I am saving 100 Records under one transaction with below settings:
>
>           PRAGMA synchronous = NORMAL;
>   PRAGMA journal_mode = WAL;
>   PRAGMA auto_vacuum = FULL;
>
> I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?

The value SQLITE_MAX_SQL_LENGTH is about the length in text of one SQL command.  I doubt it has any impact on your problem.  You should probably leave it as it is.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bulk Insert in Sqlite3

Rowan Worth-2
In reply to this post by Niti Agarwal
Hi Niti,

There's on need to build a giant SQL string; a transaction can span
multiple statements. To bind in golang place a ? within your SQL query and
provide the values as additional arguments to the Exec/Query function. eg,
after using db.Begin() to create a transaction

    tx, err := db.Begin()
    if err != nil {
        return err
    }
    _, err = tx.Exec("INSERT INTO table1 VALUES (?, ?, ?)", column1,
column2, column3)
    if err != nil {
         tx.Rollback()
         return err
    }
    err = tx.Commit()
    if err != nil {
         return err
    }

Note that golang's sql transaction abstraction doesn't map perfectly to
sqlite. Golang does not allow any further operations on the Tx following a
call to Tx.Commit() or Tx.Rollback(). But in sqlite a transaction remains
open if COMMIT fails because the database is locked. If you want to be able
to retry the COMMIT in this situation you must manually manage transactions
via db.Exec("BEGIN")/db.Exec("COMMIT") instead of db.Begin()/tx.Commit().

-Rowan


On 7 February 2017 at 13:36, Niti Agarwal <[hidden email]> wrote:

> Hi,
> We are using Sqlite3 with Golang to do bulk insert.
> Need to insert close to 10 Million records to sqlite3 in around 30 mins.
>
> Currently I am saving 100 Records under one transaction with below
> settings:
>
>            PRAGMA synchronous = NORMAL;
>    PRAGMA journal_mode = WAL;
>    PRAGMA auto_vacuum = FULL;
>
> I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?
>
> Also, read about sql bind feature, but not very sure how to do in Golang?
> Is there any better way to do bulk insert?
>
> Thanks,
> Niti
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bulk Insert in Sqlite3

Niti Agarwal
In reply to this post by Simon Slavin-3
Thanks for your reply. The length matters as I am appending 100 rows at a
time in a sql statement. It is making very fast as compared to single sql
insert in For loop.
Copied the code below for reference. Here the list size is 100
Any better way to do this? Like I read about *bind*...not sure how I can do
it in Golang.



As shown below:

func StoreFileList(db *sql.DB, fileList []File) {
sql_PathInfo := `
INSERT OR IGNORE INTO path_info(Id,FilePath) VALUES`
sql_FileInfo := `
INSERT OR REPLACE INTO file_info(
PathId,
FileName,
FileSize,
IsDir,
IsExported,
Level,
ModTime
) VALUES `

valsPath := []interface{}{}
valsFile := []interface{}{}
for _, file := range fileList {
sql_PathInfo += "(?,?),"
sql_FileInfo += "((SELECT Id FROM path_info WHERE FilePath = ?),?, ?, ?, ?,
?, ?),"
valsPath = append(valsPath, nil, file.FilePath)
valsFile = append(valsFile, file.FilePath, file.FileName, file.FileSize,
file.IsDir, file.IsExported, file.Level, file.ModTime)
}

sql_PathInfo = sql_PathInfo[0 : len(sql_PathInfo)-1]
sql_FileInfo = sql_FileInfo[0 : len(sql_FileInfo)-1]

PathInfoStmt, err := db.Prepare(sql_PathInfo)
if err != nil {
panic(err)
}
fileInfoStmt, err := db.Prepare(sql_FileInfo)
if err != nil {
panic(err)
}
defer PathInfoStmt.Close()
defer fileInfoStmt.Close()

tx, err := db.Begin()
if err != nil {
panic(err)
}
_, err = tx.Stmt(PathInfoStmt).Exec(valsPath...)
_, err1 := tx.Stmt(fileInfoStmt).Exec(valsFile...)
if err != nil || err1 != nil {
if err != nil {
panic(err)
}
if err1 != nil {
panic(err1)
}
fmt.Println("doing rollback")
tx.Rollback()
} else {
tx.Commit()
}
}

On Tue, Feb 7, 2017 at 11:56 AM, Simon Slavin <[hidden email]> wrote:

>
> On 7 Feb 2017, at 5:36am, Niti Agarwal <[hidden email]> wrote:
>
> > Need to insert close to 10 Million records to sqlite3 in around 30 mins.
>
> This number of records requires so much space the temporary data will not
> fit inside a cache.  Consider using a counter so that the transaction is
> ended and a new one begun every 1000 records.  Or perhaps every 10000
> records.  Try both ways and see which is faster.
>
> > Currently I am saving 100 Records under one transaction with below
> settings:
> >
> >           PRAGMA synchronous = NORMAL;
> >   PRAGMA journal_mode = WAL;
> >   PRAGMA auto_vacuum = FULL;
> >
> > I read about SQLITE_MAX_SQL_LENGTH, Is it better to tune this value?
>
> The value SQLITE_MAX_SQL_LENGTH is about the length in text of one SQL
> command.  I doubt it has any impact on your problem.  You should probably
> leave it as it is.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bulk Insert in Sqlite3

Simon Slavin-3

On 7 Feb 2017, at 6:56am, Niti Agarwal <[hidden email]> wrote:

> Thanks for your reply. The length matters as I am appending 100 rows at a
> time in a sql statement. It is making very fast as compared to single sql
> insert in For loop.
> Copied the code below for reference. Here the list size is 100
> Any better way to do this? Like I read about *bind*...not sure how I can do
> it in Golang.

Okay.  By using an INSERT command with lots of value sets you are doing things more efficiently than I thought.  Each INSERT is its own transaction so you are doing 100 INSERTs per transaction.

I am not familiar with GoLang.  Can someone say if it’s appropriate to use the two functions

PathInfoStmt, err := db.Prepare(sql_PathInfo)
err = tx.Stmt(PathInfoStmt).Exec(valsPath…)

like that ?  I would expect Prepare to go with Step instead but I could understand if the library being used makes it okay.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bulk Insert in Sqlite3

Rowan Worth-2
On 7 February 2017 at 15:11, Simon Slavin <[hidden email]> wrote:

>
> On 7 Feb 2017, at 6:56am, Niti Agarwal <[hidden email]> wrote:
>
> > Thanks for your reply. The length matters as I am appending 100 rows at a
> > time in a sql statement. It is making very fast as compared to single sql
> > insert in For loop.
> > Copied the code below for reference. Here the list size is 100
> > Any better way to do this? Like I read about *bind*...not sure how I can
> do
> > it in Golang.
>
> Okay.  By using an INSERT command with lots of value sets you are doing
> things more efficiently than I thought.  Each INSERT is its own transaction
> so you are doing 100 INSERTs per transaction.
>

It would a lot simpler though to move the db.Begin() outside the for loop
and execute multiple INSERT statements within the loop.


> I am not familiar with GoLang.  Can someone say if it’s appropriate to use
> the two functions
>
> PathInfoStmt, err := db.Prepare(sql_PathInfo)
> err = tx.Stmt(PathInfoStmt).Exec(valsPath…)
>
> like that ?  I would expect Prepare to go with Step instead but I could
> understand if the library being used makes it okay.
>

Yes, that's ok. These are not sqlite specific bindings, go takes a ODBC
like approach where a standard interface[1] is used to connect to various
database engines.

[1] https://golang.org/pkg/database/sql/

Exec() is designed for INSERT/UPDATEs where you're not asking the DB for
information. The read equivalent is Query() which returns a sql.Rows
structure that you iterate over using Rows.Next().

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bulk Insert in Sqlite3

James K. Lowden
In reply to this post by Rowan Worth-2
On Tue, 7 Feb 2017 14:42:13 +0800
Rowan Worth <[hidden email]> wrote:

> Note that golang's sql transaction abstraction doesn't map perfectly
> to sqlite. Golang does not allow any further operations on the Tx
> following a call to Tx.Commit() or Tx.Rollback(). But in sqlite a
> transaction remains open if COMMIT fails because the database is
> locked.

That suggests a failure of imagination in the Go SQLite driver.  

Most DBMSs return only fatal errors for COMMIT (or success, of
course).  There is no such thing as "try again".  

SQLite does not behave that way by default, but can be made to do so.
If a busy handler is installed that never returns zero, the only time
SQLite returns SQLITE_BUSY is when it determines the transaction cannot
be completed i.e., that there's a deadlock.  

ISTM the Go driver should supply a default busy handler that reduces
COMMIT errors to fatal ones.  It might expose to the application knobs
to change the retry interval.  Of course, if the application can
replace the busy handler, it also has the option of removing it, and
restoring the default behavior.  

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bulk Insert in Sqlite3

Warren Young
In reply to this post by Niti Agarwal
On Feb 6, 2017, at 10:36 PM, Niti Agarwal <[hidden email]> wrote:

> I read about SQLITE_MAX_SQL_LENGTH,

If this is why you’re making many transactions, there’s no requirement that all of the SQL that’s part of a single transaction be in a single SQL string given to the DB.  You can execute a bare “BEGIN TRANSACTION” SQL statement, do your 10 million separate insert statements, and then execute a COMMIT statement, at which time all of the rows will be visible to other users of that DB.

This is simpler, still fast, and doesn’t require that you do all that manual batching-up, or worry about internal SQLite buffer sizes.

> Also, read about sql bind feature, but not very sure how to do in Golang?

There are a bunch of SQLite wrappers for Go:

   http://go-lang.cat-v.org/library-bindings

Which one are you using?

The first one on that list appears to be MIA, but the second one includes this module, which includes a Bind() function:

   https://github.com/kuroneko/gosqlite3/blob/master/query_parameter.go

Seems straightforward to me.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users