Article about using sqlite3 in Python

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Article about using sqlite3 in Python

Simon Slavin-3
I don’t know enough about Python to evaluate this, but the sqlite3 side is sound, and some readers might find it useful.

<https://remusao.github.io/posts/2017-10-21-few-tips-sqlite-perf.html>

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Article about using sqlite3 in Python

David Raymond
Basic but good. execute will prepare the statement each time through, whereas executemany will prepare once and then just bind for each run through, which is where the time saving comes from. Depending on how complex what you're doing is though it can be hard, or more often awkward to create an iterator for executemany.

The other thing I'd add is that you can upgrade your SQLite version by replacing the sqlite3.dll file in your \pythonxx\DLLs folder and get the latest new SQLite features. Just remember that if you're then passing your script off to someone else to run that they might still have the originally installed .dll from 6 years ago and won't be able to handle new feature X.


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Sunday, October 22, 2017 12:47 PM
To: SQLite mailing list
Subject: [sqlite] Article about using sqlite3 in Python

I don’t know enough about Python to evaluate this, but the sqlite3 side is sound, and some readers might find it useful.

<https://remusao.github.io/posts/2017-10-21-few-tips-sqlite-perf.html>

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Article about using sqlite3 in Python

Damien Sykes-Lindley
Hi David,
Very useful regarding the SQLite updating procedure. I was looking in lib or
wherever the Python SQLite bindings are, but all I could find there was
sqlite3.pyd.
As for executemany, I personally use executescript which then allows me to
use begin/commit statements which also saves a lot of time.
Cheers.
Damien.
-----Original Message-----
From: David Raymond
Sent: Monday, October 23, 2017 3:36 PM
To: SQLite mailing list
Subject: Re: [sqlite] Article about using sqlite3 in Python

Basic but good. execute will prepare the statement each time through,
whereas executemany will prepare once and then just bind for each run
through, which is where the time saving comes from. Depending on how complex
what you're doing is though it can be hard, or more often awkward to create
an iterator for executemany.

The other thing I'd add is that you can upgrade your SQLite version by
replacing the sqlite3.dll file in your \pythonxx\DLLs folder and get the
latest new SQLite features. Just remember that if you're then passing your
script off to someone else to run that they might still have the originally
installed .dll from 6 years ago and won't be able to handle new feature X.


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On
Behalf Of Simon Slavin
Sent: Sunday, October 22, 2017 12:47 PM
To: SQLite mailing list
Subject: [sqlite] Article about using sqlite3 in Python

I don’t know enough about Python to evaluate this, but the sqlite3 side is
sound, and some readers might find it useful.

<https://remusao.github.io/posts/2017-10-21-few-tips-sqlite-perf.html>

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Article about using sqlite3 in Python

Keith Medcalf
In reply to this post by David Raymond

On Monday, 23 October, 2017 08:36, David Raymond <[hidden email]> wrote:

NB:  References to pysqlite2 refer to the builtin sqlite3 wrapper in python.  This wrapper was called pysqlite2 long before it was included in the standard library.  It is still updated to fix bugs and accommodate changes in the SQL DDL/DML spoken by SQLite3, updates which are eventually included in the sqlite3 module in the standard library.

>Basic but good. execute will prepare the statement each time through,
>whereas executemany will prepare once and then just bind for each run
>through, which is where the time saving comes from. Depending on how
>complex what you're doing is though it can be hard, or more often
>awkward to create an iterator for executemany.

execute vs executemany is merely syntactic sugar.  If and only if you have disabled statement caching will the statement be recompiled each time.  In all cases the sequence of sequences parameter is used as a binding to multiple calls to execute.  executemany is the equivalent of:

executemany(sql, param)
->
for args in param:
    execute(sql, args)

and has no (or very little) benefit over doing the same thing in Python itself (yes, the iteration is run in C, but it is still entirely calling into the Python VM to do all its work, so there is very little if any benefit there -- the only real benefit over the above is that locating the cached statement is only done once when you use executemany and the same C stmt pointer is used over and over again without looking it up in the cache, which does save a few cycles -- however since since the bulk of the time is spent inside the I/O bound SQLite3 library code where the GIL is released, this effectively makes minimal difference in the long run).

>The other thing I'd add is that you can upgrade your SQLite version
>by replacing the sqlite3.dll file in your \pythonxx\DLLs folder and
>get the latest new SQLite features. Just remember that if you're then
>passing your script off to someone else to run that they might still
>have the originally installed .dll from 6 years ago and won't be able
>to handle new feature X.

Plus of course you have to deal with the brain-dead and horrendous prayers required to handle transactions properly with pysqlite2 unless you are sure to always open your database with isolation_level=None to turn off the automagic faddling about, which is sure to trip you up if you try to do anything complicated.

If you are going to use SQLite with Python, you might want to look at APSW.  You can build APSW as either a wrapper for the system SQLITE3.DLL/SQLITE3.SO library (like pysqlite2) or as a completely self-contained library containing both SQLite3 and APSW

https://github.com/rogerbinns/apsw

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Simon Slavin
>Sent: Sunday, October 22, 2017 12:47 PM
>To: SQLite mailing list
>Subject: [sqlite] Article about using sqlite3 in Python
>
>I don’t know enough about Python to evaluate this, but the sqlite3
>side is sound, and some readers might find it useful.
>
><https://remusao.github.io/posts/2017-10-21-few-tips-sqlite-perf.html>
>
>Simon.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Article about using sqlite3 in Python

David Raymond
How does one enable or disable statement caching then? For something as basic as just inserting integers executemany is about 3 times faster.

Python version: 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)]
Python sqlite3 module version: 2.6.0
sqlite3.dll version: 3.20.1
executemany takes: 8.674 seconds for 10,000,000 inserts
execute takes 30.173 seconds for 10,000,000 inserts


import sqlite3
import sys
import time

numInts = 10000000

def genFunc():
    for x in xrange(numInts):
        yield (x,)

def execute():
    startTime = time.clock()
    with sqlite3.connect(":memory:", isolation_level = None) as conn:
        conn.execute("begin transaction;")
        conn.execute("create table foo (bar int);")
        for x in xrange(numInts):
            conn.execute("insert into foo values (?);", (x,))
        conn.commit()
    conn.close()
    endTime = time.clock()
    return endTime - startTime

def executemany():
    startTime = time.clock()
    with sqlite3.connect(":memory:", isolation_level = None) as conn:
        conn.execute("begin transaction;")
        conn.execute("create table foo (bar int);")
        conn.executemany("insert into foo values (?);", genFunc())
        conn.commit()
    conn.close()
    endTime = time.clock()
    return endTime - startTime

if __name__ == "__main__":
    print "Python version: {0}".format(sys.version)
    print "Python sqlite3 module version: {0}".format(sqlite3.version)
    print "sqlite3.dll version: {0}".format(sqlite3.sqlite_version)
    print "executemany takes: {0:,.3f} seconds for {1:,d} inserts".format(executemany(), numInts)
    print "execute takes {0:,.3f} seconds for {1:,d} inserts".format(execute(), numInts)

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Keith Medcalf
Sent: Monday, October 23, 2017 11:55 AM
To: SQLite mailing list
Subject: Re: [sqlite] Article about using sqlite3 in Python

execute vs executemany is merely syntactic sugar.  If and only if you have disabled statement caching will the statement be recompiled each time.  In all cases the sequence of sequences parameter is used as a binding to multiple calls to execute.  executemany is the equivalent of:

executemany(sql, param)
->
for args in param:
    execute(sql, args)

and has no (or very little) benefit over doing the same thing in Python itself (yes, the iteration is run in C, but it is still entirely calling into the Python VM to do all its work, so there is very little if any benefit there -- the only real benefit over the above is that locating the cached statement is only done once when you use executemany and the same C stmt pointer is used over and over again without looking it up in the cache, which does save a few cycles -- however since since the bulk of the time is spent inside the I/O bound SQLite3 library code where the GIL is released, this effectively makes minimal difference in the long run).
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Article about using sqlite3 in Python

Keith Medcalf

According to the documentation, it is the cache_statements keyword to the sqlite3 connect function:

The sqlite3 module internally uses a statement cache to avoid SQL parsing overhead. If you want to explicitly set the number of statements that are cached for the connection, you can set the cached_statements parameter. The currently implemented default is to cache 100 statements.

According to the documentation for APSW, it is the statementcachesize keyword to the Connection function.

Both default to 100.

And yes, in this particularly simple case where there is "practically nothing" done inside the database engine, the overhead of jumping out of the "C" interface code to back to python is noticeable.

Of course, if you build a carray containing the 10,000,000 integers and pass that to a "create table ... as select ..." and let the database do all the work, the time for executing the statement is under 1 second ... (but the time to build the carray object in python is about 5 seconds).

So it is all a trade-off.  For example parsing CSV files into an SQLITE3 database is far more efficient using the csv extension which can import and process about 100,000 rows/second.  executemany is not an option because a "great deal" of processing must be done on each row as it is inserted, versus individual row insertion which does about 10,000 rows per second.  This is because most of the processing can be pushed down to the database engine when using the csv extension and it is very efficient compared to python.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of David Raymond
>Sent: Monday, 23 October, 2017 11:54
>To: SQLite mailing list
>Subject: Re: [sqlite] Article about using sqlite3 in Python
>
>How does one enable or disable statement caching then? For something
>as basic as just inserting integers executemany is about 3 times
>faster.
>
>Python version: 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30)
>[MSC v.1500 32 bit (Intel)]
>Python sqlite3 module version: 2.6.0
>sqlite3.dll version: 3.20.1
>executemany takes: 8.674 seconds for 10,000,000 inserts
>execute takes 30.173 seconds for 10,000,000 inserts
>
>
>import sqlite3
>import sys
>import time
>
>numInts = 10000000
>
>def genFunc():
>    for x in xrange(numInts):
>        yield (x,)
>
>def execute():
>    startTime = time.clock()
>    with sqlite3.connect(":memory:", isolation_level = None) as conn:
>        conn.execute("begin transaction;")
>        conn.execute("create table foo (bar int);")
>        for x in xrange(numInts):
>            conn.execute("insert into foo values (?);", (x,))
>        conn.commit()
>    conn.close()
>    endTime = time.clock()
>    return endTime - startTime
>
>def executemany():
>    startTime = time.clock()
>    with sqlite3.connect(":memory:", isolation_level = None) as conn:
>        conn.execute("begin transaction;")
>        conn.execute("create table foo (bar int);")
>        conn.executemany("insert into foo values (?);", genFunc())
>        conn.commit()
>    conn.close()
>    endTime = time.clock()
>    return endTime - startTime
>
>if __name__ == "__main__":
>    print "Python version: {0}".format(sys.version)
>    print "Python sqlite3 module version:
>{0}".format(sqlite3.version)
>    print "sqlite3.dll version: {0}".format(sqlite3.sqlite_version)
>    print "executemany takes: {0:,.3f} seconds for {1:,d}
>inserts".format(executemany(), numInts)
>    print "execute takes {0:,.3f} seconds for {1:,d}
>inserts".format(execute(), numInts)
>
>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Keith Medcalf
>Sent: Monday, October 23, 2017 11:55 AM
>To: SQLite mailing list
>Subject: Re: [sqlite] Article about using sqlite3 in Python
>
>execute vs executemany is merely syntactic sugar.  If and only if you
>have disabled statement caching will the statement be recompiled each
>time.  In all cases the sequence of sequences parameter is used as a
>binding to multiple calls to execute.  executemany is the equivalent
>of:
>
>executemany(sql, param)
>->
>for args in param:
>    execute(sql, args)
>
>and has no (or very little) benefit over doing the same thing in
>Python itself (yes, the iteration is run in C, but it is still
>entirely calling into the Python VM to do all its work, so there is
>very little if any benefit there -- the only real benefit over the
>above is that locating the cached statement is only done once when
>you use executemany and the same C stmt pointer is used over and over
>again without looking it up in the cache, which does save a few
>cycles -- however since since the bulk of the time is spent inside
>the I/O bound SQLite3 library code where the GIL is released, this
>effectively makes minimal difference in the long run).
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Article about using sqlite3 in Python

E.Pasma
In reply to this post by Simon Slavin-3
22 okt 2017, 18:47, Simon Slavin:

> I don’t know enough about Python to evaluate this, but the sqlite3  
> side is sound, and some readers might find it useful.
>
> <https://remusao.github.io/posts/2017-10-21-few-tips-sqlite-perf.html>
>
> Simon.

It is written very well. However for readers with an SQLite background  
someting must still be added. The built-in sqlite3 module begins a  
transaction behind the scenes as soon as a DML statement is executed.  
And expects a commit to be programmed or a connection to be explicitly  
closed. Or a WITH construction. From an SQLite point of view one is  
better of with the Other Sqlite Wrapper (apsw).
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users