importing a large TSV file

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

importing a large TSV file

Gert Van Assche-2
Hi all,

I need to create an SQLite db from a large TSV file. (30 GB)
Are there any setting I can give to the db so I can speed up the import?

thank you

Gert
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: importing a large TSV file

Domingo Alvarez Duarte
Hello Gert !

I normally do this (be aware that if there is a power outage the
database is screwed):

===

PRAGMA synchronous = OFF;

begin;

--processing here

commit;

PRAGMA synchronous = ON;

===

Cheers !

On 1/4/19 13:14, Gert Van Assche wrote:

> Hi all,
>
> I need to create an SQLite db from a large TSV file. (30 GB)
> Are there any setting I can give to the db so I can speed up the import?
>
> thank you
>
> Gert
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: importing a large TSV file

Donald Griggs
In reply to this post by Gert Van Assche-2
I believe it's also helpful to avoid creating indexes (and enforcing
foreign keys) until after the import.

On Mon, Apr 1, 2019 at 7:15 AM Gert Van Assche <[hidden email]> wrote:

> Hi all,
>
> I need to create an SQLite db from a large TSV file. (30 GB)
> Are there any setting I can give to the db so I can speed up the import?
>
> thank you
>
> Gert
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: importing a large TSV file

Simon Slavin-3
In reply to this post by Gert Van Assche-2
On 1 Apr 2019, at 12:14pm, Gert Van Assche <[hidden email]> wrote:

> I need to create an SQLite db from a large TSV file. (30 GB)
> Are there any setting I can give to the db so I can speed up the import?

If you're doing it using the SQLite CLI tool, then just rely on the tool to do it in the most convenient manner.

If you're doing it in your own code, here is the pattern which this list thinks is fastest:

PRAGMA foreign_keys = OFF;
BEGIN;
    DROP TABLE IF EXISTS ...
    CREATE TABLE ...
END;
BEGIN;
    ... import all your data into the table ...
END;
BEGIN;
    ... CREATE all your INDEXes ...
    VACUUM; -- optional
END;
PRAGMA foreign_keys = ON;

The VACUUM step isn't really needed.  It might speed up future access or reduce the filespace used, especially for big databases.  In case you wonder later, SQLite is easily able to handle a table with 30GB of data.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: importing a large TSV file

Simon Slavin-3
On 1 Apr 2019, at 2:41pm, Simon Slavin <[hidden email]> wrote:

> BEGIN;
>    ... CREATE all your INDEXes ...
>    VACUUM; -- optional
> END;

Apologies.  The documentation says

"A VACUUM will fail if there is an open transaction"

I'm not sure whether this means that VACUUM must be outside the transaction, like the particular PRAGMA I used.  So it would better to move the VACUUM after the END.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: importing a large TSV file

David Raymond
I believe it's basically saying that the way it's implemented, vacuum can't be rolled back, and therefore can't be done in the middle of a transaction, it has to be on its own.



-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Monday, April 01, 2019 9:47 AM
To: SQLite mailing list
Subject: Re: [sqlite] importing a large TSV file

On 1 Apr 2019, at 2:41pm, Simon Slavin <[hidden email]> wrote:

> BEGIN;
>    ... CREATE all your INDEXes ...
>    VACUUM; -- optional
> END;

Apologies.  The documentation says

"A VACUUM will fail if there is an open transaction"

I'm not sure whether this means that VACUUM must be outside the transaction, like the particular PRAGMA I used.  So it would better to move the VACUUM after the END.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: importing a large TSV file

Rowan Worth-2
In reply to this post by Domingo Alvarez Duarte
On Mon, 1 Apr 2019 at 19:20, Domingo Alvarez Duarte <[hidden email]>
wrote:

> Hello Gert !
>
> I normally do this (be aware that if there is a power outage the
> database is screwed):
>
> ===
>
> PRAGMA synchronous = OFF;
> begin;
>
> --processing here
>
> commit;
> PRAGMA synchronous = ON;
>

You can probably leave the pragma alone without overly affecting import
time tbh. The main thing is putting all the work into one transaction, and
at that point you're down to 2 or 3 sync() calls. I guess there's still
value in not having to wait for the journal to hit disk though. Maybe even
PRAGMA journal_mode = OFF would be appropriate.

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: importing a large TSV file

Gert Van Assche-2
Thank you all for these tips. Very helpful!

Op di 2 apr. 2019 om 08:35 schreef Rowan Worth <[hidden email]>:

> On Mon, 1 Apr 2019 at 19:20, Domingo Alvarez Duarte <[hidden email]>
> wrote:
>
> > Hello Gert !
> >
> > I normally do this (be aware that if there is a power outage the
> > database is screwed):
> >
> > ===
> >
> > PRAGMA synchronous = OFF;
> > begin;
> >
> > --processing here
> >
> > commit;
> > PRAGMA synchronous = ON;
> >
>
> You can probably leave the pragma alone without overly affecting import
> time tbh. The main thing is putting all the work into one transaction, and
> at that point you're down to 2 or 3 sync() calls. I guess there's still
> value in not having to wait for the journal to hit disk though. Maybe even
> PRAGMA journal_mode = OFF would be appropriate.
>
> -Rowan
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users