High performance and concurrency

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

High performance and concurrency

Shevek
Hi,

I would like to have truly concurrent access to an sqlite database, that
is, the ability for multiple connections to read from the database
simultaneously. I'm using Java with xerial's sqlite-jdbc, customized to
let me mmap the entire database into RAM, and with additional debugging
symbols for perf. The database is about 30Gb, fully read-only, and the
connections are opened as such.

What I think is happening is that either a pthread mutex or a database
lock is serializing the accesses, so each thread blocks the others.

Queries are taking a few seconds, even with covering indexes, and I have
the RAM bandwidth available, so I'd really like to use it.

Any pointers?

Thank you.

S.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: High performance and concurrency

Simon Slavin-3


On 1 Mar 2018, at 7:24am, Shevek <[hidden email]> wrote:

> What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.

What journal mode are you using ?

<https://sqlite.org/pragma.html#pragma_journal_mode>

If it's not WAL, try WAL.  If it's currently WAL, try DELETE.  Once you've changed it see if this changes how your program behaves.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: High performance and concurrency

Shevek
On 02/28/2018 11:45 PM, Simon Slavin wrote:
>
> On 1 Mar 2018, at 7:24am, Shevek <[hidden email]> wrote:
>
>> What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.

To be specific, I'm concerned about is the line
sqlite3_mutex_enter(db->mutex) at the top of sqlite3_step(). Since my
queries are spending all their time in sqliteVdbeExec(), which is
reached through that path, I assume db->mutex is preventing concurrency.

Our main hotspots in the query are sqliteVdbeExec() and updating the
btree pointer to point to a new page (I forget the call name). We can't
do much about the cost of execution; we've mmap'd everything to avoid
the I/O, we're using covering indexes to help with locality, we've
sorted our query keys to attempt to reduce index page seeks, and now we
want to use concurrency and splitting the logic in our query to exploit
memory bandwidth.

Now I've traced this again, I'm looking warily at SQLITE_OPEN_NOMUTEX
because we need thread-safety, as in, sqlite's internal data structures
must be handled correctly in the presence of multiple threads or passing
a connection between threads (safely in the JMM); we just don't need
serialization of database reads and writes, because nothing we do has a
serializable side-effect. Is SQLITE_OPEN_NOMUTEX the answer?

> What journal mode are you using ?

I'm fairly sure journal mode is NONE for our readonly database. Anyway,
readonly shouldn't write to a journal. We have confirmed that the md5sum
of the database file is unchanged during and after the execution of our
application.

S.

> <https://sqlite.org/pragma.html#pragma_journal_mode>
>
> If it's not WAL, try WAL.  If it's currently WAL, try DELETE.  Once you've changed it see if this changes how your program behaves.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: High performance and concurrency

Hick Gunter
Use 1 connection for each thread. Sharing a connections between threads may cause transactions to be larger than each thread thinks.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Shevek
Gesendet: Donnerstag, 01. März 2018 09:10
An: SQLite mailing list <[hidden email]>; Simon Slavin <[hidden email]>
Betreff: [EXTERNAL] Re: [sqlite] High performance and concurrency

On 02/28/2018 11:45 PM, Simon Slavin wrote:
>
> On 1 Mar 2018, at 7:24am, Shevek <[hidden email]> wrote:
>
>> What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.

To be specific, I'm concerned about is the line
sqlite3_mutex_enter(db->mutex) at the top of sqlite3_step(). Since my queries are spending all their time in sqliteVdbeExec(), which is reached through that path, I assume db->mutex is preventing concurrency.

Our main hotspots in the query are sqliteVdbeExec() and updating the btree pointer to point to a new page (I forget the call name). We can't do much about the cost of execution; we've mmap'd everything to avoid the I/O, we're using covering indexes to help with locality, we've sorted our query keys to attempt to reduce index page seeks, and now we want to use concurrency and splitting the logic in our query to exploit memory bandwidth.

Now I've traced this again, I'm looking warily at SQLITE_OPEN_NOMUTEX because we need thread-safety, as in, sqlite's internal data structures must be handled correctly in the presence of multiple threads or passing a connection between threads (safely in the JMM); we just don't need serialization of database reads and writes, because nothing we do has a serializable side-effect. Is SQLITE_OPEN_NOMUTEX the answer?

> What journal mode are you using ?

I'm fairly sure journal mode is NONE for our readonly database. Anyway, readonly shouldn't write to a journal. We have confirmed that the md5sum of the database file is unchanged during and after the execution of our application.

S.

> <https://sqlite.org/pragma.html#pragma_journal_mode>
>
> If it's not WAL, try WAL.  If it's currently WAL, try DELETE.  Once you've changed it see if this changes how your program behaves.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: High performance and concurrency

Shevek
On 03/01/2018 01:24 AM, Hick Gunter wrote:
> Use 1 connection for each thread. Sharing a connections between threads may cause transactions to be larger than each thread thinks.

Why would I have a transaction of non-zero size on a read-only connection?

It looks from the source as if having bCoreMutex=true and
bFullMutex=false will allow us the concurrency we need. I'm going to try
again in a couple of days.

Our journal_mode is OFF.

We use HikariCP, so a connection is in use by one thread at a time with
JMM-safe handoff, and they all share the mmap region.

S.

> -----Ursprüngliche Nachricht-----
> Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Shevek
> Gesendet: Donnerstag, 01. März 2018 09:10
> An: SQLite mailing list <[hidden email]>; Simon Slavin <[hidden email]>
> Betreff: [EXTERNAL] Re: [sqlite] High performance and concurrency
>
> On 02/28/2018 11:45 PM, Simon Slavin wrote:
>>
>> On 1 Mar 2018, at 7:24am, Shevek <[hidden email]> wrote:
>>
>>> What I think is happening is that either a pthread mutex or a database lock is serializing the accesses, so each thread blocks the others.
>
> To be specific, I'm concerned about is the line
> sqlite3_mutex_enter(db->mutex) at the top of sqlite3_step(). Since my queries are spending all their time in sqliteVdbeExec(), which is reached through that path, I assume db->mutex is preventing concurrency.
>
> Our main hotspots in the query are sqliteVdbeExec() and updating the btree pointer to point to a new page (I forget the call name). We can't do much about the cost of execution; we've mmap'd everything to avoid the I/O, we're using covering indexes to help with locality, we've sorted our query keys to attempt to reduce index page seeks, and now we want to use concurrency and splitting the logic in our query to exploit memory bandwidth.
>
> Now I've traced this again, I'm looking warily at SQLITE_OPEN_NOMUTEX because we need thread-safety, as in, sqlite's internal data structures must be handled correctly in the presence of multiple threads or passing a connection between threads (safely in the JMM); we just don't need serialization of database reads and writes, because nothing we do has a serializable side-effect. Is SQLITE_OPEN_NOMUTEX the answer?
>
>> What journal mode are you using ?
>
> I'm fairly sure journal mode is NONE for our readonly database. Anyway, readonly shouldn't write to a journal. We have confirmed that the md5sum of the database file is unchanged during and after the execution of our application.
>
> S.
>
>> <https://sqlite.org/pragma.html#pragma_journal_mode>
>>
>> If it's not WAL, try WAL.  If it's currently WAL, try DELETE.  Once you've changed it see if this changes how your program behaves.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>   Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: High performance and concurrency

Clemens Ladisch
Shevek wrote:
> Why would I have a transaction of non-zero size on a read-only connection?

What do you mean with "size"?

A read-only transaction still puts a shared lock on the database file.

A read-only transaction will not change the DB file, but SQLite has lots of
internal data structures in memory, and those can change.

To give each thread its own data structures, use separate connections.

> I assume db->mutex is preventing concurrency

"db" is the connection object; each one has its own mutex.


Regards,
Clemens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: High performance and concurrency

Rowan Worth-2
In reply to this post by Shevek
On 2 March 2018 at 03:43, Shevek <[hidden email]> wrote:

> We use HikariCP, so a connection is in use by one thread at a time with
> JMM-safe handoff, and they all share the mmap region.
>

Shevek also wrote:

> What I think is happening is that either a pthread mutex or a database
lock is serializing the accesses, so each thread blocks the others.

I'm not familiar with HikariCP but if it's handing the connection around to
a single thread at a time, sounds like database accesses are serialised
long before sqlite becomes a factor.

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users