Massively multithreaded SQLite queries

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Massively multithreaded SQLite queries

Andrew Brown
Hello,

Based on my reading of the documentation it seems like SQLite is a great candidate for cases where you have a large number of threads that only need to read a database simultaneously, in our case for fast data lookup, aggregation, etc. I've been able to generate SQL queries that do this, but once we start running them on a large server with a lot of multithreading going on, I find that we spend a lot of time in __raw_spin_lock - perhaps 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in __raw_spin_lock). This is being run on 64 and 72 core machines, and the more cores I run it on, the slower it ends up going.

To give a bit more detail, I'm working with dotnet core, have written a custom sqlite wrapper (since the dotnet core one lacks the ability to set connection flags beyond readonly, and doesn't have sqlite_prepare_v2() implemented), and I'm running on linux against a bunch of SQLite files in the 2gb-400gb size range. Individual queries are wicked fast, but once I start spreading the load over all the cores by running simultaneous queries I lose the performance advantage and it actually becomes significantly slower.

Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried shared cache, read uncommitted. Tried without shared cache, read uncommitted. Tried WAL. If I write a less efficient query, I spend less time in __raw_spin_lock, but of course then it takes longer for the queries themselves to return.

Any tips to handle massively multithreaded side by side chunked queries on the same database?

Thank you in advance,
Andrew Brown
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Josh Hunsaker
On Fri, Mar 3, 2017 at 1:13 PM, Andrew Brown wrote:
>
> Any tips to handle massively multithreaded side by side chunked
> queries on the same database?

In my (limited) experience, it seems that multithreaded SQLite
acquires a lock on a shared, in-memory b-tree every time a
statement is prepared.  It might not be the database read itself,
but statement preparation that is causing threads to wait for
each other.  Your problem might be mitigated if you could compile
your queries in advance.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Jens Alfke-2

> On Mar 3, 2017, at 2:52 PM, Josh Hunsaker <[hidden email]> wrote:
>
> Your problem might be mitigated if you could compile your queries in advance.

Precompiled statements are a must if you want the best performance (and you’re running the same queries over and over.)

Andrew, how many database handles are open?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Andrew Brown
Well, in the situation I'm working with, my API is constructing 1835 small SQL jobs to be run, and then passing them off to a structure in which 72 threads are running, each with their own db connection (I assume that's what you mean by a database handle, a DB connection, but please, correct me if I'm wrong!). So in this case, 72 database handles on my bigger server.

Unfortunately, I'm not running the same queries over and over (one example is a 400gb database with 3-5 dimension columns and a few data columns, and this is slicing on that data) so preparing them will have somewhat less benefit in that sense than in other cases. That said, I can still try preparing all the statements before I run any.



-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Jens Alfke
Sent: Friday, March 3, 2017 3:25 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Massively multithreaded SQLite queries


> On Mar 3, 2017, at 2:52 PM, Josh Hunsaker <[hidden email]> wrote:
>
> Your problem might be mitigated if you could compile your queries in advance.

Precompiled statements are a must if you want the best performance (and you’re running the same queries over and over.)

Andrew, how many database handles are open?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7C5fba3607e2164acbf45808d4628c85a6%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=1xM5isGNHfqUu7yWfoohbYBryDxcgzed0Qlz37K0FDw%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Scott Hess
I'd say you should consider switching to some sort of queue feeding a
worker pool, then experimenting with pool sizes.  Often problems reward the
first few threads you add, but at some point additional threads become a
negative unless the system is specifically designed for high thread counts
(and such design can be annoying for low-thread-count users).  There also
may be caching interactions which improve with a smaller number of threads.

Something else to try is to have multiple databases which are not sharing
page caches (to reduce locking).  It is entirely possible that having 4
databases each with 8 threads could be faster than one database with 32
threads, because they each keep out of each other's way, more.

[None of the above is really SQLite specific.]

-scott


On Fri, Mar 3, 2017 at 3:37 PM, Andrew Brown <
[hidden email]> wrote:

> Well, in the situation I'm working with, my API is constructing 1835 small
> SQL jobs to be run, and then passing them off to a structure in which 72
> threads are running, each with their own db connection (I assume that's
> what you mean by a database handle, a DB connection, but please, correct me
> if I'm wrong!). So in this case, 72 database handles on my bigger server.
>
> Unfortunately, I'm not running the same queries over and over (one example
> is a 400gb database with 3-5 dimension columns and a few data columns, and
> this is slicing on that data) so preparing them will have somewhat less
> benefit in that sense than in other cases. That said, I can still try
> preparing all the statements before I run any.
>
>
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Jens Alfke
> Sent: Friday, March 3, 2017 3:25 PM
> To: SQLite mailing list <[hidden email]>
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> > On Mar 3, 2017, at 2:52 PM, Josh Hunsaker <[hidden email]>
> wrote:
> >
> > Your problem might be mitigated if you could compile your queries in
> advance.
>
> Precompiled statements are a must if you want the best performance (and
> you’re running the same queries over and over.)
>
> Andrew, how many database handles are open?
>
> —Jens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.
> brown%40economicmodeling.com%7C5fba3607e2164acbf45808d4628c85a6%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=
> 1xM5isGNHfqUu7yWfoohbYBryDxcgzed0Qlz37K0FDw%3D&reserved=0
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Simon Slavin-3

On 3 Mar 2017, at 11:47pm, Scott Hess <[hidden email]> wrote:

> I'd say you should consider switching to some sort of queue feeding a
> worker pool, then experimenting with pool sizes.

Agreed.  Set up 5 threads which have actual access to the database.  Have your 1835 jobs queue up their jobs for servicing by those 5 threads.

Then mess with the '5' until you find a good value.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Jens Alfke-2

> On Mar 3, 2017, at 3:51 PM, Simon Slavin <[hidden email]> wrote:
>
> Then mess with the '5' until you find a good value.

A common rule of thumb with thread pools is to allocate one thread per CPU core.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Warren Young
In reply to this post by Andrew Brown
On Mar 3, 2017, at 1:13 PM, Andrew Brown <[hidden email]> wrote:
>
> This is being run on 64 and 72 core machines, and the more cores I run it on, the slower it ends up going.

What happens if you rework the app to do only one thing, single-threaded, but run between 64 and 108 instances in parallel on the same system?  (108 == 72 * 1.5, a common work factor value for parallel applications.)

You can use a tool like GNU parallel to manage the work queue:

    https://www.gnu.org/software/parallel/

If nothing else, this would separate out the in-process locking from the file locking, thereby making more clear where the delays are coming from.

Your __raw_spin_lock() result isn’t terribly enlightening because that’s a very low level Linux kernel mechanism.  It doesn’t tell us whether the slowdown is due to file locking in SQLite, mutexes in SQLite, some non-optimal pthreads use in .NET Core, etc.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Andrew Brown
In reply to this post by Andrew Brown
So in this case I'm running on a 72 core machine.

Also the databases have one table in them each... The goal here is to bring all the cores to bear on the many queries - each grabbing the next query to be run and running it, until finally there are no more chunks to run.

Then within my own apis I aggregate the results and form my response.

I'm going to try preparing all the statements before I run any later tonight. I'm also toying with the idea of using a shared nothing architecture, in which I run 72 processes instead of threads, in the hope that there will be less contention that way. Thoughts on that idea?

I really appreciate everyone's responsiveness.

On Mar 3, 2017 4:19 PM, Jens Alfke <[hidden email]> wrote:

> On Mar 3, 2017, at 3:51 PM, Simon Slavin <[hidden email]> wrote:
>
> Then mess with the '5' until you find a good value.

A common rule of thumb with thread pools is to allocate one thread per CPU core.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7C1941086b72c84122e80e08d462942220%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=ytN2iaMT9eiK%2BktzJva1shKgrBfYhxeUHyesJscJnB8%3D&reserved=0

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Keith Medcalf
In reply to this post by Andrew Brown

Does each thread have its own connection?

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Andrew Brown
> Sent: Friday, 3 March, 2017 13:14
> To: [hidden email]
> Subject: [sqlite] Massively multithreaded SQLite queries
>
> Hello,
>
> Based on my reading of the documentation it seems like SQLite is a great
> candidate for cases where you have a large number of threads that only
> need to read a database simultaneously, in our case for fast data lookup,
> aggregation, etc. I've been able to generate SQL queries that do this, but
> once we start running them on a large server with a lot of multithreading
> going on, I find that we spend a lot of time in __raw_spin_lock - perhaps
> 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in
> __raw_spin_lock). This is being run on 64 and 72 core machines, and the
> more cores I run it on, the slower it ends up going.
>
> To give a bit more detail, I'm working with dotnet core, have written a
> custom sqlite wrapper (since the dotnet core one lacks the ability to set
> connection flags beyond readonly, and doesn't have sqlite_prepare_v2()
> implemented), and I'm running on linux against a bunch of SQLite files in
> the 2gb-400gb size range. Individual queries are wicked fast, but once I
> start spreading the load over all the cores by running simultaneous
> queries I lose the performance advantage and it actually becomes
> significantly slower.
>
> Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried
> shared cache, read uncommitted. Tried without shared cache, read
> uncommitted. Tried WAL. If I write a less efficient query, I spend less
> time in __raw_spin_lock, but of course then it takes longer for the
> queries themselves to return.
>
> Any tips to handle massively multithreaded side by side chunked queries on
> the same database?
>
> Thank you in advance,
> Andrew Brown
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Keith Medcalf
In reply to this post by Jens Alfke-2

No, the good rule of thumb is to allocate one thread per CPU.  Depending on the particular multi-core CPU you "may" be able to use all the cores simultaneously but in many cases diminishing returns will set in long before you can execute one thread per core.  If this is an Intel processor that claims it has more than one thread per core be *extremely* careful as that will give you one thread and one half-assed thread per core.  Sometimes, half-assed cores are presented as real cores when they lack a separate execution unit.  Be vary wary.

I presume you are forcing separate threads to separate cores by setting processor affinity on the threads and not just blindly hoping that the OS scheduler does "the right thing"?

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Jens Alfke
> Sent: Friday, 3 March, 2017 17:19
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> > On Mar 3, 2017, at 3:51 PM, Simon Slavin <[hidden email]> wrote:
> >
> > Then mess with the '5' until you find a good value.
>
> A common rule of thumb with thread pools is to allocate one thread per CPU
> core.
>
> —Jens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Warren Young
On Mar 3, 2017, at 5:51 PM, Keith Medcalf <[hidden email]> wrote:
>
> No, the good rule of thumb is to allocate one thread per CPU.

It depends on the workload.  Parallel make (e.g. “make -jN” in GNU make) typically improves in speed past N=core count to about 1.5x the core count.

SQLite seems like a similar kind of workload: lots of CPU *and* disk I/O, so that you need a bit of oversubscription to keep all the cores busy, because some threads/processes will be stalled on I/O.

Not that any of this is relevant at the current point, since the OP is currently neither I/O bound nor CPU-bound, but lock-bound.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Scott Hess
Yes, if they are lock bound, then they need to have the number of cores
which reduces the locking overhead to the point where it's not degrading
performance too much.  Though I guess the OP really didn't say that (more
CPUs may spend more time in spinlocks and still spend less wallclock time).

Another thing to look at it whether any queries can be more effectively
scheduled.  Having hundreds of completely-unrelated queries seems unlikely
to me.  More likely is that you have a smaller number of queries which are
targeting various different bind parameters.  Preparing a particular query
once, then looping and running each set of bind parameters on one thread is
probably going to be _much_ more efficient.

-scott


On Fri, Mar 3, 2017 at 5:03 PM, Warren Young <[hidden email]> wrote:

> On Mar 3, 2017, at 5:51 PM, Keith Medcalf <[hidden email]> wrote:
> >
> > No, the good rule of thumb is to allocate one thread per CPU.
>
> It depends on the workload.  Parallel make (e.g. “make -jN” in GNU make)
> typically improves in speed past N=core count to about 1.5x the core count.
>
> SQLite seems like a similar kind of workload: lots of CPU *and* disk I/O,
> so that you need a bit of oversubscription to keep all the cores busy,
> because some threads/processes will be stalled on I/O.
>
> Not that any of this is relevant at the current point, since the OP is
> currently neither I/O bound nor CPU-bound, but lock-bound.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Andrew Brown
In reply to this post by Andrew Brown
Yes, each thread has its own connection.

On Mar 3, 2017 4:45 PM, Keith Medcalf <[hidden email]> wrote:

Does each thread have its own connection?

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Andrew Brown
> Sent: Friday, 3 March, 2017 13:14
> To: [hidden email]
> Subject: [sqlite] Massively multithreaded SQLite queries
>
> Hello,
>
> Based on my reading of the documentation it seems like SQLite is a great
> candidate for cases where you have a large number of threads that only
> need to read a database simultaneously, in our case for fast data lookup,
> aggregation, etc. I've been able to generate SQL queries that do this, but
> once we start running them on a large server with a lot of multithreading
> going on, I find that we spend a lot of time in __raw_spin_lock - perhaps
> 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in
> __raw_spin_lock). This is being run on 64 and 72 core machines, and the
> more cores I run it on, the slower it ends up going.
>
> To give a bit more detail, I'm working with dotnet core, have written a
> custom sqlite wrapper (since the dotnet core one lacks the ability to set
> connection flags beyond readonly, and doesn't have sqlite_prepare_v2()
> implemented), and I'm running on linux against a bunch of SQLite files in
> the 2gb-400gb size range. Individual queries are wicked fast, but once I
> start spreading the load over all the cores by running simultaneous
> queries I lose the performance advantage and it actually becomes
> significantly slower.
>
> Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried
> shared cache, read uncommitted. Tried without shared cache, read
> uncommitted. Tried WAL. If I write a less efficient query, I spend less
> time in __raw_spin_lock, but of course then it takes longer for the
> queries themselves to return.
>
> Any tips to handle massively multithreaded side by side chunked queries on
> the same database?
>
> Thank you in advance,
> Andrew Brown
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0



_______________________________________________
sqlite-users mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Andrew Brown
In reply to this post by Andrew Brown
Now this is a interesting question. As it happens I /am/ blindly assuming the os would schedule it properly. I will look at my options this evening and see about addressing that.

On Mar 3, 2017 4:52 PM, Keith Medcalf <[hidden email]> wrote:

No, the good rule of thumb is to allocate one thread per CPU.  Depending on the particular multi-core CPU you "may" be able to use all the cores simultaneously but in many cases diminishing returns will set in long before you can execute one thread per core.  If this is an Intel processor that claims it has more than one thread per core be *extremely* careful as that will give you one thread and one half-assed thread per core.  Sometimes, half-assed cores are presented as real cores when they lack a separate execution unit.  Be vary wary.

I presume you are forcing separate threads to separate cores by setting processor affinity on the threads and not just blindly hoping that the OS scheduler does "the right thing"?

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Jens Alfke
> Sent: Friday, 3 March, 2017 17:19
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> > On Mar 3, 2017, at 3:51 PM, Simon Slavin <[hidden email]> wrote:
> >
> > Then mess with the '5' until you find a good value.
>
> A common rule of thumb with thread pools is to allocate one thread per CPU
> core.
>
> —Jens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0uoib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0



_______________________________________________
sqlite-users mailing list
[hidden email]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0uoib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Warren Young
On Mar 3, 2017, at 6:24 PM, Andrew Brown <[hidden email]> wrote:
>
> I /am/ blindly assuming the os would schedule it properly.

Why would it?  Windows won’t do that for you, either.

Now, if you’re using .NET’s thread pool mechanisms, it *should* be making sure it doesn’t oversubscribe the cores too much, but then we’re assuming Microsoft is writing good Linux code here.

The .NET Core code is open, so you can see what it does on Linux.

(I could see, too, but I can’t say I care enough to bother. :) )
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Scott Hess
In reply to this post by Andrew Brown
You might want to try enabling mmap mode:
  pragma mmap_size = 4294967296;
or something like that.  Try to make it larger than your databases.  I'd
expect that if you're running with that many cores, you're _probably_
running in a 64-bit address space, so it'll probably work.

-scott


On Fri, Mar 3, 2017 at 5:22 PM, Andrew Brown <
[hidden email]> wrote:

> Yes, each thread has its own connection.
>
> On Mar 3, 2017 4:45 PM, Keith Medcalf <[hidden email]> wrote:
>
> Does each thread have its own connection?
>
> > -----Original Message-----
> > From: sqlite-users [mailto:[hidden email]]
> > On Behalf Of Andrew Brown
> > Sent: Friday, 3 March, 2017 13:14
> > To: [hidden email]
> > Subject: [sqlite] Massively multithreaded SQLite queries
> >
> > Hello,
> >
> > Based on my reading of the documentation it seems like SQLite is a great
> > candidate for cases where you have a large number of threads that only
> > need to read a database simultaneously, in our case for fast data lookup,
> > aggregation, etc. I've been able to generate SQL queries that do this,
> but
> > once we start running them on a large server with a lot of multithreading
> > going on, I find that we spend a lot of time in __raw_spin_lock - perhaps
> > 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in
> > __raw_spin_lock). This is being run on 64 and 72 core machines, and the
> > more cores I run it on, the slower it ends up going.
> >
> > To give a bit more detail, I'm working with dotnet core, have written a
> > custom sqlite wrapper (since the dotnet core one lacks the ability to set
> > connection flags beyond readonly, and doesn't have sqlite_prepare_v2()
> > implemented), and I'm running on linux against a bunch of SQLite files in
> > the 2gb-400gb size range. Individual queries are wicked fast, but once I
> > start spreading the load over all the cores by running simultaneous
> > queries I lose the performance advantage and it actually becomes
> > significantly slower.
> >
> > Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried
> > shared cache, read uncommitted. Tried without shared cache, read
> > uncommitted. Tried WAL. If I write a less efficient query, I spend less
> > time in __raw_spin_lock, but of course then it takes longer for the
> > queries themselves to return.
> >
> > Any tips to handle massively multithreaded side by side chunked queries
> on
> > the same database?
> >
> > Thank you in advance,
> > Andrew Brown
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.
> brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=
> gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.
> brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=
> gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Keith Medcalf
In reply to this post by Andrew Brown

As an aside, you may likely find that the OS will schedule multiple processes far more efficiently than it schedules multiple threads (especially from the perspective of scheduling the cores independently).  You may find that it is far more efficient to perhaps schedule a dozen processes with six to a dozen worker threads each (or whatever number seems to be optimal) and use some kind of IPC to submit workload to the individual processes.  

Also, make sure the database is in WAL journal mode -- this will help even if you are only running read transactions + if you have enough memory make sure the page cache and temp space is big enough on each connection and force temp work into memory - but do not overcommit memory.

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Andrew Brown
> Sent: Friday, 3 March, 2017 18:24
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
> Now this is a interesting question. As it happens I /am/ blindly assuming
> the os would schedule it properly. I will look at my options this evening
> and see about addressing that.
>
> On Mar 3, 2017 4:52 PM, Keith Medcalf <[hidden email]> wrote:
>
> No, the good rule of thumb is to allocate one thread per CPU.  Depending
> on the particular multi-core CPU you "may" be able to use all the cores
> simultaneously but in many cases diminishing returns will set in long
> before you can execute one thread per core.  If this is an Intel processor
> that claims it has more than one thread per core be *extremely* careful as
> that will give you one thread and one half-assed thread per core.
> Sometimes, half-assed cores are presented as real cores when they lack a
> separate execution unit.  Be vary wary.
>
> I presume you are forcing separate threads to separate cores by setting
> processor affinity on the threads and not just blindly hoping that the OS
> scheduler does "the right thing"?
>
> > -----Original Message-----
> > From: sqlite-users [mailto:[hidden email]]
> > On Behalf Of Jens Alfke
> > Sent: Friday, 3 March, 2017 17:19
> > To: SQLite mailing list
> > Subject: Re: [sqlite] Massively multithreaded SQLite queries
> >
> >
> > > On Mar 3, 2017, at 3:51 PM, Simon Slavin <[hidden email]> wrote:
> > >
> > > Then mess with the '5' until you find a good value.
> >
> > A common rule of thumb with thread pools is to allocate one thread per
> CPU
> > core.
> >
> > —Jens
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> >
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglis
> ts.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-
> users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9
> a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0u
> oib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglis
> ts.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-
> users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9
> a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0u
> oib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Deon Brewis
In reply to this post by Keith Medcalf
Can you give an example of an Intel Processor SKU with half-assed cores?

There's HyperThreading of course, but I don't think anybody has ever considered HyperThreading to be separate cores.

- Deon

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Keith Medcalf
Sent: Friday, March 3, 2017 4:52 PM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Massively multithreaded SQLite queries


No, the good rule of thumb is to allocate one thread per CPU.  Depending on the particular multi-core CPU you "may" be able to use all the cores simultaneously but in many cases diminishing returns will set in long before you can execute one thread per core.  If this is an Intel processor that claims it has more than one thread per core be *extremely* careful as that will give you one thread and one half-assed thread per core.  Sometimes, half-assed cores are presented as real cores when they lack a separate execution unit.  Be vary wary.

I presume you are forcing separate threads to separate cores by setting processor affinity on the threads and not just blindly hoping that the OS scheduler does "the right thing"?

> -----Original Message-----
> From: sqlite-users
> [mailto:[hidden email]]
> On Behalf Of Jens Alfke
> Sent: Friday, 3 March, 2017 17:19
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> > On Mar 3, 2017, at 3:51 PM, Simon Slavin <[hidden email]> wrote:
> >
> > Then mess with the '5' until you find a good value.
>
> A common rule of thumb with thread pools is to allocate one thread per
> CPU core.
>
> —Jens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Massively multithreaded SQLite queries

Keith Medcalf

You must never have used a Microsoft Operating system ...

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Deon Brewis
> Sent: Saturday, 4 March, 2017 00:33
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
> Can you give an example of an Intel Processor SKU with half-assed cores?
>
> There's HyperThreading of course, but I don't think anybody has ever
> considered HyperThreading to be separate cores.
>
> - Deon
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Keith Medcalf
> Sent: Friday, March 3, 2017 4:52 PM
> To: SQLite mailing list <[hidden email]>
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> No, the good rule of thumb is to allocate one thread per CPU.  Depending
> on the particular multi-core CPU you "may" be able to use all the cores
> simultaneously but in many cases diminishing returns will set in long
> before you can execute one thread per core.  If this is an Intel processor
> that claims it has more than one thread per core be *extremely* careful as
> that will give you one thread and one half-assed thread per core.
> Sometimes, half-assed cores are presented as real cores when they lack a
> separate execution unit.  Be vary wary.
>
> I presume you are forcing separate threads to separate cores by setting
> processor affinity on the threads and not just blindly hoping that the OS
> scheduler does "the right thing"?
>
> > -----Original Message-----
> > From: sqlite-users
> > [mailto:[hidden email]]
> > On Behalf Of Jens Alfke
> > Sent: Friday, 3 March, 2017 17:19
> > To: SQLite mailing list
> > Subject: Re: [sqlite] Massively multithreaded SQLite queries
> >
> >
> > > On Mar 3, 2017, at 3:51 PM, Simon Slavin <[hidden email]> wrote:
> > >
> > > Then mess with the '5' until you find a good value.
> >
> > A common rule of thumb with thread pools is to allocate one thread per
> > CPU core.
> >
> > —Jens
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12