FW: Questions about your "Performance Matters" talk re SQLite

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

FW: Questions about your "Performance Matters" talk re SQLite

Doug
I asked for some information from Emery Berger about his video talk on performance where he said they got a 25% improvement in SQLite performance. Here is the reply I got back.

 

I know there has been a lot of talk about what can and cannot be done with the C calling interface because of compatibility issues and the myriad set of wrappers on various forms. I’m having a hard time letting go of a possible 25% performance improvement.

 

I don’t have the slightest idea on how to run a benchmark (but I could learn). I wonder if the current set of benchmarks used by SQLite developers actually measure throughput using wall-clock numbers. It might be a good idea to put a wrapper around all the benchmarks to capture how long they took to run (wall-clock), and include things like number and type of cpu cores, average cpu busy time, and other relevant numbers. If the benchmarks are run on lots of different machines (all over the world?), it would provide an excellent view of what changes in SQLite made a difference in performance.

 

Doug

 

From: Curtsinger, Charlie <[hidden email]>
Sent: Thursday, January 02, 2020 10:55 AM
To: [hidden email]
Cc: Emery D Berger <[hidden email]>
Subject: Re: Questions about your "Performance Matters" talk re SQLite

 

Hello Doug,

 

I was able to track down the sqlite benchmark I ran for the paper, and I’ve checked it into the github repository at https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite. This benchmark creates 64 threads that operate on independent tables in the sqlite database, performing operations that should be almost entirely independent. This benchmark exposes contention inside of sqlite, since running it with a larger number of hardware threads will hurt performance. I see a performance improvement of nearly 5x when I run this on a two-core linux VM versus a 64-thread Xeon machine, since there are fewer opportunities for the threads to interfere with each other.

 

You can also find the modified version of sqlite with the same benchmark at https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite-modified. There are just a few changes from indirect to direct calls in the sqlite3.c file.

 

I reran the experiment on the same machine we used for the original Coz paper, and saw a performance improvement of around 20% with the modified version of sqlite. That’s slightly less than what we originally found, but I didn’t do many runs (just five) and there’s quite a bit of variability. The compiler has been upgraded on this machine as well, so there could be some effect there as well. On a much-newer 64-thread Xeon machine I see a difference of just 5%, still in favor of the modified version of sqlite. That’s not terribly surprising, since Intel has baked a lot of extra pointer-chasing and branch prediction smarts into processors in the years since we set up the 64-core AMD machine we originally used for the Coz benchmarks.

 

As far as measuring performance, I’d encourage you *not* to use cpu cycles as a proxy for runtime. Dynamic frequency scaling can mess up these measurements, especially if the clock frequency is dropped in response to the program’s behavior. Putting many threads to sleep might allow the OS to drop the CPU frequency, thereby reducing the number of CPU cycles. That doesn’t mean the program will actually run in a shorter wall clock time. Some CPUs have a hardware event that counts “clock cycles” at a constant rate even with frequency scaling, but these are really just high-precision timers and would be perfectly fine for measuring runtime. I’m thinking of the “ref-cycles” event from perf here.

 

Hope this helps,

- Charlie

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Barry Smith
One thing that really stands is “creates 64 threads that operate on independent tables in the sqlite database, performing operations that should be almost entirely independent.”

But that’s not how SQLite works - at least not when writing data. SQLite takes a lock on the entire database, there is no fine granularity locking that allows you to perform simultaneous writes to different tables.

It seems attempting to do this - use 64 threads to write to a database - is a highly inefficient use of the library.

> On 2 Jan 2020, at 1:54 pm, Doug <[hidden email]> wrote:
>
> I asked for some information from Emery Berger about his video talk on performance where he said they got a 25% improvement in SQLite performance. Here is the reply I got back.
>
>
>
> I know there has been a lot of talk about what can and cannot be done with the C calling interface because of compatibility issues and the myriad set of wrappers on various forms. I’m having a hard time letting go of a possible 25% performance improvement.
>
>
>
> I don’t have the slightest idea on how to run a benchmark (but I could learn). I wonder if the current set of benchmarks used by SQLite developers actually measure throughput using wall-clock numbers. It might be a good idea to put a wrapper around all the benchmarks to capture how long they took to run (wall-clock), and include things like number and type of cpu cores, average cpu busy time, and other relevant numbers. If the benchmarks are run on lots of different machines (all over the world?), it would provide an excellent view of what changes in SQLite made a difference in performance.
>
>
>
> Doug
>
>
>
> From: Curtsinger, Charlie <[hidden email]>
> Sent: Thursday, January 02, 2020 10:55 AM
> To: [hidden email]
> Cc: Emery D Berger <[hidden email]>
> Subject: Re: Questions about your "Performance Matters" talk re SQLite
>
>
>
> Hello Doug,
>
>
>
> I was able to track down the sqlite benchmark I ran for the paper, and I’ve checked it into the github repository at https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite. This benchmark creates 64 threads that operate on independent tables in the sqlite database, performing operations that should be almost entirely independent. This benchmark exposes contention inside of sqlite, since running it with a larger number of hardware threads will hurt performance. I see a performance improvement of nearly 5x when I run this on a two-core linux VM versus a 64-thread Xeon machine, since there are fewer opportunities for the threads to interfere with each other.
>
>
>
> You can also find the modified version of sqlite with the same benchmark at https://github.com/plasma-umass/coz/tree/master/benchmarks/sqlite-modified. There are just a few changes from indirect to direct calls in the sqlite3.c file.
>
>
>
> I reran the experiment on the same machine we used for the original Coz paper, and saw a performance improvement of around 20% with the modified version of sqlite. That’s slightly less than what we originally found, but I didn’t do many runs (just five) and there’s quite a bit of variability. The compiler has been upgraded on this machine as well, so there could be some effect there as well. On a much-newer 64-thread Xeon machine I see a difference of just 5%, still in favor of the modified version of sqlite. That’s not terribly surprising, since Intel has baked a lot of extra pointer-chasing and branch prediction smarts into processors in the years since we set up the 64-core AMD machine we originally used for the Coz benchmarks.
>
>
>
> As far as measuring performance, I’d encourage you *not* to use cpu cycles as a proxy for runtime. Dynamic frequency scaling can mess up these measurements, especially if the clock frequency is dropped in response to the program’s behavior. Putting many threads to sleep might allow the OS to drop the CPU frequency, thereby reducing the number of CPU cycles. That doesn’t mean the program will actually run in a shorter wall clock time. Some CPUs have a hardware event that counts “clock cycles” at a constant rate even with frequency scaling, but these are really just high-precision timers and would be perfectly fine for measuring runtime. I’m thinking of the “ref-cycles” event from perf here.
>
>
>
> Hope this helps,
>
> - Charlie
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Richard Hipp-3
On 1/2/20, Barry Smith <[hidden email]> wrote:
> One thing that really stands is “creates 64 threads that operate on
> independent tables in the sqlite database, performing operations that should
> be almost entirely independent.”
>

Looking at the main.c file
(https://github.com/plasma-umass/coz/blob/master/benchmarks/sqlite/main.c)
it appears that the test creates 64 separate database connections,
each with a separate in-memory database.

There are two sources of contention here:

(1) SQLite keeps track of the total amount of memory it is using on
all threads.  So for each malloc() and free() it has to take a mutex
to increase or decrease the counters.  This is probably the primary
source of contention.  It can be disabled by running:

    sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);

early in main(), before any other SQLite interface calls.  Make that
one change and I suspect that most of the thread contention will go
away.

(2) SQLite has a single PRNG used by all threads.  And so there is a
mutex that has to be taken whenever a new random number is generated.
But the workload does not appear to be using any random numbers, so I
doubt that this is an actual problem in this case.

> I’d encourage you *not* to use cpu cycles as a proxy for runtime. Dynamic frequency
> scaling can mess up these measurements, especially if the clock frequency is dropped
> in response to the program’s behavior.

The task requires X number of CPU cycles *regardless* of the clock
frequency.  If the clock slows down, then it takes more elapse time to
run those X cycles, but it does not increase or decrease the number of
cycles required.  So in that sense, counting the number of CPU cycles
is an excellent measure of effort required to complete the
computation.

Furthermore, the idea that thread contention will cause the CPU clock
to slow down seems silly.  Technically, I suppose such a think might
actually happen - IF you do all of your work as multiple threads
within the same process and they all blocked on the same resource.
The point is, you shouldn't do that.  Instead of one process with 64
threads, how about 64 processes with one thread each.  Since they are
all doing different things (serving independent HTTP requests, for
example) they might as well each have their own address space.
Keeping each job in a separate process provides isolation for added
security.  And it completely eliminates the need for mutexes and the
accompanying thread contention.

If SQLite runs faster for you when you make direct calls to
pthread_mutex_lock() rather than indirect calls, how much faster would
it run if you completely eliminated all calls to pthread_mutex_lock()
by putting each task in a separate process?


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Keith Medcalf

Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s) performance increase.  
Changing the threading mode or the indirection level of the mutexes calls seems to have no significant effect.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Richard Hipp
>Sent: Thursday, 2 January, 2020 16:00
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] FW: Questions about your "Performance Matters" talk
>re SQLite
>
>On 1/2/20, Barry Smith <[hidden email]> wrote:
>> One thing that really stands is “creates 64 threads that operate on
>> independent tables in the sqlite database, performing operations that
>should
>> be almost entirely independent.”
>>
>
>Looking at the main.c file
>(https://github.com/plasma-
>umass/coz/blob/master/benchmarks/sqlite/main.c)
>it appears that the test creates 64 separate database connections,
>each with a separate in-memory database.
>
>There are two sources of contention here:
>
>(1) SQLite keeps track of the total amount of memory it is using on
>all threads.  So for each malloc() and free() it has to take a mutex
>to increase or decrease the counters.  This is probably the primary
>source of contention.  It can be disabled by running:
>
>    sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
>
>early in main(), before any other SQLite interface calls.  Make that
>one change and I suspect that most of the thread contention will go
>away.
>
>(2) SQLite has a single PRNG used by all threads.  And so there is a
>mutex that has to be taken whenever a new random number is generated.
>But the workload does not appear to be using any random numbers, so I
>doubt that this is an actual problem in this case.
>
>> I’d encourage you *not* to use cpu cycles as a proxy for runtime.
>Dynamic frequency
>> scaling can mess up these measurements, especially if the clock
>frequency is dropped
>> in response to the program’s behavior.
>
>The task requires X number of CPU cycles *regardless* of the clock
>frequency.  If the clock slows down, then it takes more elapse time to
>run those X cycles, but it does not increase or decrease the number of
>cycles required.  So in that sense, counting the number of CPU cycles
>is an excellent measure of effort required to complete the
>computation.
>
>Furthermore, the idea that thread contention will cause the CPU clock
>to slow down seems silly.  Technically, I suppose such a think might
>actually happen - IF you do all of your work as multiple threads
>within the same process and they all blocked on the same resource.
>The point is, you shouldn't do that.  Instead of one process with 64
>threads, how about 64 processes with one thread each.  Since they are
>all doing different things (serving independent HTTP requests, for
>example) they might as well each have their own address space.
>Keeping each job in a separate process provides isolation for added
>security.  And it completely eliminates the need for mutexes and the
>accompanying thread contention.
>
>If SQLite runs faster for you when you make direct calls to
>pthread_mutex_lock() rather than indirect calls, how much faster would
>it run if you completely eliminated all calls to pthread_mutex_lock()
>by putting each task in a separate process?
>
>
>--
>D. Richard Hipp
>[hidden email]
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Howard Chu
Keith Medcalf wrote:
>
> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s) performance increase.  
> Changing the threading mode or the indirection level of the mutexes calls seems to have no significant effect.
>
Goes to show - publishing benchmark results without investigating why they are what
they are is mostly pointless. When you suspect mutex contention is a significant
factor, you should use something like mutrace to confirm your suspicion first.

Fundamentally there ought to be no performance difference between running a 64-threaded
server on a 64-threaded CPU vs 64 single-threaded processes. In practice, the single
process with 64 threads ought to be slightly faster, due to less context switch overhead
between threads, but if nothing else in the system is contending for CPU then context
switching shouldn't even be an issue.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

skywalk
In reply to this post by Keith Medcalf
I get SQLITE_MISUSE when attempting
sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
immediately after opening a db connection?
My connection has THREADSAFE = 1.

On Thu, Jan 2, 2020 at 7:32 PM Keith Medcalf <[hidden email]> wrote:

>
> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s)
> performance increase.
> Changing the threading mode or the indirection level of the mutexes calls
> seems to have no significant effect.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users <[hidden email]> On
> >Behalf Of Richard Hipp
> >Sent: Thursday, 2 January, 2020 16:00
> >To: SQLite mailing list <[hidden email]>
> >Subject: Re: [sqlite] FW: Questions about your "Performance Matters" talk
> >re SQLite
> >
> >On 1/2/20, Barry Smith <[hidden email]> wrote:
> >> One thing that really stands is “creates 64 threads that operate on
> >> independent tables in the sqlite database, performing operations that
> >should
> >> be almost entirely independent.”
> >>
> >
> >Looking at the main.c file
> >(https://github.com/plasma-
> >umass/coz/blob/master/benchmarks/sqlite/main.c)
> >it appears that the test creates 64 separate database connections,
> >each with a separate in-memory database.
> >
> >There are two sources of contention here:
> >
> >(1) SQLite keeps track of the total amount of memory it is using on
> >all threads.  So for each malloc() and free() it has to take a mutex
> >to increase or decrease the counters.  This is probably the primary
> >source of contention.  It can be disabled by running:
> >
> >    sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
> >
> >early in main(), before any other SQLite interface calls.  Make that
> >one change and I suspect that most of the thread contention will go
> >away.
> >
> >(2) SQLite has a single PRNG used by all threads.  And so there is a
> >mutex that has to be taken whenever a new random number is generated.
> >But the workload does not appear to be using any random numbers, so I
> >doubt that this is an actual problem in this case.
> >
> >> I’d encourage you *not* to use cpu cycles as a proxy for runtime.
> >Dynamic frequency
> >> scaling can mess up these measurements, especially if the clock
> >frequency is dropped
> >> in response to the program’s behavior.
> >
> >The task requires X number of CPU cycles *regardless* of the clock
> >frequency.  If the clock slows down, then it takes more elapse time to
> >run those X cycles, but it does not increase or decrease the number of
> >cycles required.  So in that sense, counting the number of CPU cycles
> >is an excellent measure of effort required to complete the
> >computation.
> >
> >Furthermore, the idea that thread contention will cause the CPU clock
> >to slow down seems silly.  Technically, I suppose such a think might
> >actually happen - IF you do all of your work as multiple threads
> >within the same process and they all blocked on the same resource.
> >The point is, you shouldn't do that.  Instead of one process with 64
> >threads, how about 64 processes with one thread each.  Since they are
> >all doing different things (serving independent HTTP requests, for
> >example) they might as well each have their own address space.
> >Keeping each job in a separate process provides isolation for added
> >security.  And it completely eliminates the need for mutexes and the
> >accompanying thread contention.
> >
> >If SQLite runs faster for you when you make direct calls to
> >pthread_mutex_lock() rather than indirect calls, how much faster would
> >it run if you completely eliminated all calls to pthread_mutex_lock()
> >by putting each task in a separate process?
> >
> >
> >--
> >D. Richard Hipp
> >[hidden email]
> >_______________________________________________
> >sqlite-users mailing list
> >[hidden email]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Keith Medcalf
On Friday, 3 January, 2020 09:30, [hidden email] wrote:

>I get SQLITE_MISUSE when attempting
>sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
>immediately after opening a db connection?
>My connection has THREADSAFE = 1.

That is correct.  You must configure the library before it is initialized, not after.

https://sqlite.org/c3ref/config.html

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Eric Grange-3
> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s)
performance increase.
> Changing the threading mode or the indirection level of the mutexes calls
seems to have no significant effect.

That is quite significant.
Looking at the code, it seems the mutex requirement is mostly for the soft
heap limit, maybe there could be a way to use atomic instruction to
maintain the stats ?
The highwater stat could be handled by a CAS (loop), and this should make
the memstatus data cheaper overall when soft heap limit is not required.

(my use case involved a hundreds of independent SQLite databases from a
single process, I have not benchmarked yet, but I am probably hitting that
mutex hard as well)

Eric




Le ven. 3 janv. 2020 à 17:36, Keith Medcalf <[hidden email]> a écrit :

> On Friday, 3 January, 2020 09:30, [hidden email] wrote:
>
> >I get SQLITE_MISUSE when attempting
> >sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
> >immediately after opening a db connection?
> >My connection has THREADSAFE = 1.
>
> That is correct.  You must configure the library before it is initialized,
> not after.
>
> https://sqlite.org/c3ref/config.html
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

skywalk
In reply to this post by Keith Medcalf
Is there a query function for these and other config settings?
I see no sqlite3_config_get() in sqlite3.h.

On Fri, Jan 3, 2020 at 11:36 AM Keith Medcalf <[hidden email]> wrote:

> On Friday, 3 January, 2020 09:30, [hidden email] wrote:
>
> >I get SQLITE_MISUSE when attempting
> >sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);
> >immediately after opening a db connection?
> >My connection has THREADSAFE = 1.
>
> That is correct.  You must configure the library before it is initialized,
> not after.
>
> https://sqlite.org/c3ref/config.html
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Richard Hipp-3
On 1/3/20, [hidden email] <[hidden email]> wrote:
> Is there a query function for these and other config settings?
> I see no sqlite3_config_get() in sqlite3.h.

There is no query function for the SQLITE_CONFIG_MEMSTATUS setting.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Keith Medcalf
In reply to this post by skywalk

On Friday, 3 January, 2020 11:32, [hidden email] wrote:

> Is there a query function for these and other config settings?
> I see no sqlite3_config_get() in sqlite3.h.

No.  There are config options to get specific config data where that might be useful.  
Otherwise, you simply set the configuration to match your requirements.

That is, if you need to have configuration X, Y, and Z then just set it the way you want it.  Either the configuration is set that way or you get an error return (you do check return codes right, that is what they are for).  There is not really much point in doing "if memstatus is on turn it off" when you can just turn it off.
 
--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

skywalk
Well, I told you I'm getting SQLITE_MISUSE so that kinda answers your side
question?
I am only interested in this topic for the performance gain so quoted.
I need to create a personal test case(my use model) to verify these
statements.
Querying the config state is helpful for a dll wrapped database, but I can
adapt without the '_get()' function.

On Fri, Jan 3, 2020 at 2:02 PM Keith Medcalf <[hidden email]> wrote:

>
> On Friday, 3 January, 2020 11:32, [hidden email] wrote:
>
> > Is there a query function for these and other config settings?
> > I see no sqlite3_config_get() in sqlite3.h.
>
> No.  There are config options to get specific config data where that might
> be useful.
> Otherwise, you simply set the configuration to match your requirements.
>
> That is, if you need to have configuration X, Y, and Z then just set it
> the way you want it.  Either the configuration is set that way or you get
> an error return (you do check return codes right, that is what they are
> for).  There is not really much point in doing "if memstatus is on turn it
> off" when you can just turn it off.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Questions about your "Performance Matters" talk re SQLite

Tim Streater-3
On 03 Jan 2020, at 22:08, sky5walk <[hidden email]> wrote:

> Querying the config state is helpful for a dll wrapped database, ...

What's one of them?


--
Cheers  --  Tim
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Questions about your "Performance Matters" talk re SQLite

skywalk
haha, that is a mangled way of saying I wrapped my db functions in a dll
for multiple app use. I did not expose this config setting as I never knew
its impact. To be honest, I still don't. ;)

On Fri, Jan 3, 2020 at 5:18 PM Tim Streater <[hidden email]> wrote:

> On 03 Jan 2020, at 22:08, sky5walk <[hidden email]> wrote:
>
> > Querying the config state is helpful for a dll wrapped database, ...
>
> What's one of them?
>
>
> --
> Cheers  --  Tim
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Jens Alfke-2
In reply to this post by Doug

> On Jan 2, 2020, at 11:54 AM, Doug <[hidden email]> wrote:
>
> I know there has been a lot of talk about what can and cannot be done with the C calling interface because of compatibility issues and the myriad set of wrappers on various forms. I’m having a hard time letting go of a possible 25% performance improvement.

This was a heavily multithreaded benchmark (64 threads accessing the same connection) on a very hefty server-class CPU. From Dr Hipp’s results, it sounds like the speed up may be only in similar situations, not to more normal SQLite usage.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

J Decker
In reply to this post by Howard Chu
On Sat, Jan 4, 2020 at 2:59 AM Howard Chu <[hidden email]> wrote:

> Keith Medcalf wrote:
> >
> > Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s)
> performance increase.
> > Changing the threading mode or the indirection level of the mutexes
> calls seems to have no significant effect.
> >
> Goes to show - publishing benchmark results without investigating why they
> are what
> they are is mostly pointless. When you suspect mutex contention is a
> significant
> factor, you should use something like mutrace to confirm your suspicion
> first.
>
> Fundamentally there ought to be no performance difference between running
> a 64-threaded
> server on a 64-threaded CPU vs 64 single-threaded processes. In practice,
> the single
> process with 64 threads ought to be slightly faster, due to less context
> switch overhead
> between threads, but if nothing else in the system is contending for CPU
> then context
> switching shouldn't even be an issue.
>

but, if the mutex lock always does a 'sched_yield' when the locking fails,
then a failure to lock isn't just a stutter in the thread, but can result
in at least 2 more context switches every time.  Probably something simpler
like a simple InterlockedExchange() call in a loop could maintain locking,
and affect performance much less; since it sounds like the actual lock
duration is really only a few clocks anyway.  Could wish there was a way to
pause execution without giving up execution context...


>
> --
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Doug
In reply to this post by Jens Alfke-2
Thanks, Jens. I got it. The benchmark sounds like it isn't a real benchmark, but a made-up scenario to exercise the Coz code. I've let go now.
Doug

> -----Original Message-----
> From: sqlite-users <[hidden email]>
> On Behalf Of Jens Alfke
> Sent: Friday, January 03, 2020 10:58 PM
> To: SQLite mailing list <[hidden email]>
> Cc: [hidden email]; [hidden email]
> Subject: Re: [sqlite] FW: Questions about your "Performance
> Matters" talk re SQLite
>
>
> > On Jan 2, 2020, at 11:54 AM, Doug <[hidden email]> wrote:
> >
> > I know there has been a lot of talk about what can and cannot be
> done with the C calling interface because of compatibility issues
> and the myriad set of wrappers on various forms. I’m having a hard
> time letting go of a possible 25% performance improvement.
>
> This was a heavily multithreaded benchmark (64 threads accessing
> the same connection) on a very hefty server-class CPU. From Dr
> Hipp’s results, it sounds like the speed up may be only in similar
> situations, not to more normal SQLite usage.
>
> —Jens
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Doug
In reply to this post by J Decker
> -----Original Message-----
> From: sqlite-users <[hidden email]>
> On Behalf Of J Decker
> Sent: Saturday, January 04, 2020 4:11 AM
>
> Could wish there was a way to
> pause execution without giving up execution context...

What about?
for (i=1000; i--; i>0);

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Questions about your "Performance Matters" talk re SQLite

Simon Slavin-3
On 4 Jan 2020, at 5:49pm, Doug <[hidden email]> wrote:

> for (i=1000; i--; i>0);

If you have optimization turned on, your compiler might turn that into "i = 0".  Optimization messes with a lot of benchmarks.  Checking which optimization setting was used is one aspect of what Howard Chu was talking about: benchmark figures in isolation don't mean much.

I'm not a big fan of general statements about which form is faster, uses less memory, whatever, than which other form.  I want to say "try both ways and keep the one which is best".
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: FW: Questions about your "Performance Matters" talk re SQLite

Keith Medcalf
In reply to this post by Doug

Well, actually, no.  

It was a single process that spins up 64 threads each of which accesses its own per-thread in-memory database using an in-that-thread-only-in-thread-database-connection-in-that-thread.

Making some simple modifications (changing the number of threads to 6 and the insertions/thread to 1,000,000 -- so as to consume no more resources than are available (this is on a 4-core Xeon (2 SMT threads per core) 32 GB RAM Win10 1909) yields interesting results (CPU usage was about 90% for all of them, though of course the linear runs consumed only 1 thread on one core so about 12.5% CPU):

SERIALIZED   MEMSTATUS ON   Elapsed Time :  00:00:19.944  MPR=0.75
SERIALIZED   MEMSTATUS OFF  Elapsed Time :  00:00:04.408  MPR=3.97
MULTITHREAD  MEMSTATUS ON   Elapsed Time :  00:00:19.815  MPR=0.72
MULTITHREAD  MEMSTATUS OFF  Elapsed Time :  00:00:04.456  MPR=2.88

*MPR is the Multiprogramming Ratio:  (elapsed time running the workload linearly)/(elapsed time running the workload in parallel)

In this case the "linear" time is obtained by running the same test with the pthread_join immediately following the pthread_create so that only one thread of the workload runs at a time, one after each.

So I conclude that SERIALIZED/MULTITHREAD makes very little difference and that MEMSTATUS ON/OFF makes a huge difference.  Since the difference between MUTITHREAD and SERIALIZED is a thread mutex on the connection, I can conclude that with zero contention for that mutex, whether it is being used or not makes little difference and that the fact that it is, when used, a double-indexed indirect call through a global data area, leads me to believe that the benefit derived from replacing that double-indexed indirect call with an immediate call cannot possibly have any significant effect if the elimination of that call entirely has negligible effect.

From the MPR _for_this_workload_on_this_particular_CPU_ I can conclude that if one wishes to have MEMSTATUS enabled, then using multiple threads is detrimental to performance and that linear processing is significantly more efficient.  However, when MEMSTATUS is turned off, then serialized mode leads to significant increased multiprogramming benefit.

GCC 8.1.0 (MinGW-x64)
gcc -m64 -mwin32 -mconsole -mthreads -O3 -s -pipe -Wl,-Bstatic,--nxcompat,--dynamicbase test.c -ID:\Source\bld -ID:\Source\bld\tsrc -Ld:\source\bld\gcc\64 -lsqlite3.dll -o test.exe -static-libgcc -lpthread

sqlite3.dll compiled using:
gcc -s -O3 -pipe -D_HAVE_SQLITE_CONFIG_H -DSQLITE_EXTRA_INIT=core_init -DSQLITE_HAVE_ZLIB -Itsrc -march=native -mtune=native -m64 -mdll -mthreads -Wl,-Bstatic,--nxcompat,--dynamicbase,--high-entropy-va,--image-base,0x180000000,--out-implib,gcc/64/libsqlite3.dll.a,--output-def,gcc/64/sqlite3.def sqlite3.c sqlite3.def -ladvapi32 -lrpcrt4 -lwinmm -lz -static-libgcc -o gcc/64/sqlite3.dll

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Doug
>Sent: Saturday, 4 January, 2020 10:42
>To: 'SQLite mailing list' <[hidden email]>
>Subject: Re: [sqlite] FW: Questions about your "Performance Matters" talk
>re SQLite
>
>Thanks, Jens. I got it. The benchmark sounds like it isn't a real
>benchmark, but a made-up scenario to exercise the Coz code. I've let go
>now.
>Doug
>
>> -----Original Message-----
>> From: sqlite-users <[hidden email]>
>> On Behalf Of Jens Alfke
>> Sent: Friday, January 03, 2020 10:58 PM
>> To: SQLite mailing list <[hidden email]>
>> Cc: [hidden email]; [hidden email]
>> Subject: Re: [sqlite] FW: Questions about your "Performance
>> Matters" talk re SQLite
>>
>>
>> > On Jan 2, 2020, at 11:54 AM, Doug <[hidden email]> wrote:
>> >
>> > I know there has been a lot of talk about what can and cannot be
>> done with the C calling interface because of compatibility issues
>> and the myriad set of wrappers on various forms. I’m having a hard
>> time letting go of a possible 25% performance improvement.
>>
>> This was a heavily multithreaded benchmark (64 threads accessing
>> the same connection) on a very hefty server-class CPU. From Dr
>> Hipp’s results, it sounds like the speed up may be only in similar
>> situations, not to more normal SQLite usage.
>>
>> —Jens
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>> users
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12