Fwd: Write performance question for 3.7.15

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: Write performance question for 3.7.15

Dan Frankowski
I am running a benchmark of inserting 100 million (100M) items into a
table. I am seeing performance I don't understand. Graph:
http://imgur.com/hH1Jr. Can anyone explain:

1. Why does write speed (writes/second) slow down dramatically around 28M
items?
2. Are there parameters (perhaps related to table size) that would change
this write performance?

=======================

Create and insert statements:

create table if not exists t_foo (
  key binary(16) primary key,
  value binary(16));

insert or replace into t_foo (key, value) values (?, ?)

key and value are each 16-byte arrays.

I turn auto-commit off and commit every 1000 inserts.
I set synchronous mode to OFF and journaling mode to WAL (write-ahead log).

I am using sqlite 3.7.15 through the Xerial JDBC driver (see
https://bitbucket.org/xerial/sqlite-jdbc). I built it myself, due to a
glibc incompatibility (see
https://groups.google.com/d/msg/Xerial/F9roGuUjH6c/6RuxqmG6UK4J).

I am running on Gentoo. Output of uname -a:

Linux mymachine 3.2.1-c42.31 #1 SMP Mon Apr 30 10:55:12 CDT 2012 x86_64
Quad-Core AMD Opteron(tm) Processor 1381 AuthenticAMD GNU/Linux

It has 8G of memory.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Dan Frankowski
On Fri, Dec 28, 2012 at 3:34 PM, Dan Frankowski <[hidden email]> wrote:

> I am running a benchmark of inserting 100 million (100M) items into a
> table. I am seeing performance I don't understand. Graph:
> http://imgur.com/hH1Jr. Can anyone explain:
>
> 1. Why does write speed (writes/second) slow down dramatically around 28M
> items?
> 2. Are there parameters (perhaps related to table size) that would change
> this write performance?
>

3. Would horizontal partitioning (i.e. creating multiple tables, each for a
different key range) help?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Write performance question for 3.7.15

Michael Black
In reply to this post by Dan Frankowski
Perhaps the rowid index cache gets too big?  I assume you don't have any
indexes of your own?

Does the knee change if you say, double your cache_size?

Default should be 2000;

pragma cache_size=4000;


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Dan Frankowski
Sent: Friday, December 28, 2012 3:34 PM
To: [hidden email]
Subject: [sqlite] Fwd: Write performance question for 3.7.15

I am running a benchmark of inserting 100 million (100M) items into a
table. I am seeing performance I don't understand. Graph:
http://imgur.com/hH1Jr. Can anyone explain:

1. Why does write speed (writes/second) slow down dramatically around 28M
items?
2. Are there parameters (perhaps related to table size) that would change
this write performance?

=======================

Create and insert statements:

create table if not exists t_foo (
  key binary(16) primary key,
  value binary(16));

insert or replace into t_foo (key, value) values (?, ?)

key and value are each 16-byte arrays.

I turn auto-commit off and commit every 1000 inserts.
I set synchronous mode to OFF and journaling mode to WAL (write-ahead log).

I am using sqlite 3.7.15 through the Xerial JDBC driver (see
https://bitbucket.org/xerial/sqlite-jdbc). I built it myself, due to a
glibc incompatibility (see
https://groups.google.com/d/msg/Xerial/F9roGuUjH6c/6RuxqmG6UK4J).

I am running on Gentoo. Output of uname -a:

Linux mymachine 3.2.1-c42.31 #1 SMP Mon Apr 30 10:55:12 CDT 2012 x86_64
Quad-Core AMD Opteron(tm) Processor 1381 AuthenticAMD GNU/Linux

It has 8G of memory.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Write performance question for 3.7.15

Stephen Chrzanowski
My guess would be the OS slowing things down with write caching.  The
system will hold so much data in memory as a cache to write to the disk,
and when the cache gets full, the OS slows down and waits on the HDD.  Try
doing a [dd] to a few gig worth of random data and see if you get the same
kind of slow down.

On Fri, Dec 28, 2012 at 4:42 PM, Michael Black <[hidden email]> wrote:

> Perhaps the rowid index cache gets too big?  I assume you don't have any
> indexes of your own?
>
> Does the knee change if you say, double your cache_size?
>
> Default should be 2000;
>
> pragma cache_size=4000;
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Dan Frankowski
> Sent: Friday, December 28, 2012 3:34 PM
> To: [hidden email]
> Subject: [sqlite] Fwd: Write performance question for 3.7.15
>
> I am running a benchmark of inserting 100 million (100M) items into a
> table. I am seeing performance I don't understand. Graph:
> http://imgur.com/hH1Jr. Can anyone explain:
>
> 1. Why does write speed (writes/second) slow down dramatically around 28M
> items?
> 2. Are there parameters (perhaps related to table size) that would change
> this write performance?
>
> =======================
>
> Create and insert statements:
>
> create table if not exists t_foo (
>   key binary(16) primary key,
>   value binary(16));
>
> insert or replace into t_foo (key, value) values (?, ?)
>
> key and value are each 16-byte arrays.
>
> I turn auto-commit off and commit every 1000 inserts.
> I set synchronous mode to OFF and journaling mode to WAL (write-ahead log).
>
> I am using sqlite 3.7.15 through the Xerial JDBC driver (see
> https://bitbucket.org/xerial/sqlite-jdbc). I built it myself, due to a
> glibc incompatibility (see
> https://groups.google.com/d/msg/Xerial/F9roGuUjH6c/6RuxqmG6UK4J).
>
> I am running on Gentoo. Output of uname -a:
>
> Linux mymachine 3.2.1-c42.31 #1 SMP Mon Apr 30 10:55:12 CDT 2012 x86_64
> Quad-Core AMD Opteron(tm) Processor 1381 AuthenticAMD GNU/Linux
>
> It has 8G of memory.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Simon Slavin-3

On 29 Dec 2012, at 12:37pm, Stephen Chrzanowski <[hidden email]> wrote:

> My guess would be the OS slowing things down with write caching.  The
> system will hold so much data in memory as a cache to write to the disk,
> and when the cache gets full, the OS slows down and waits on the HDD.  Try
> doing a [dd] to a few gig worth of random data and see if you get the same
> kind of slow down.

Makes sense.  A revealing of how much memory the operating system is using for caching.  Once you hit 30M rows you exceed the amount of memory the system is using for caching, and it has to start reading or writing disk for every operation which is far slower.  Or it's the amount of memory that the operating system is allowing the benchmarking process to use.  Or some other OS limitation.

But the underlying information in our responses is that it's not a decision built into SQLite.  There's nothing in SQLite which says we use a fast strategy for up to 25M rows and then a slower one from then on.

A good way to track it down would be to close the database at the point where performance starts to tank, and look at how big the filesize is.  That size should give a clue about what resource the OS is limiting to that size.  Another might be to add an extra unindexed column to the test database and fill it with a fixed text string in each row.  If this changes the number of rows before the cliff edge then it's dependent on total filesize.  If it doesn't, then it's dependent on the size of the index being searched for each INSERT.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Write performance question for 3.7.15

Valentin Davydov-2
In reply to this post by Dan Frankowski
On Fri, Dec 28, 2012 at 03:34:02PM -0600, Dan Frankowski wrote:
> I am running a benchmark of inserting 100 million (100M) items into a
> table. I am seeing performance I don't understand. Graph:
> http://imgur.com/hH1Jr. Can anyone explain:
>
> 1. Why does write speed (writes/second) slow down dramatically around 28M
> items?

Most probably, indices became too large to fit in the in-memory cache.
You can verify this by tracing system activity: this threshold should
manifest itself by drastical increase in _read_ operations on disk(s).

> 2. Are there parameters (perhaps related to table size) that would change
> this write performance?

CACHE_SIZE. It makes sense to enlarge it up to the all available memory.

Valentin Davydov.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Valentin Davydov-2
In reply to this post by Dan Frankowski
On Fri, Dec 28, 2012 at 03:35:17PM -0600, Dan Frankowski wrote:
>
> 3. Would horizontal partitioning (i.e. creating multiple tables, each for a
> different key range) help?

This would seriously impair read performance (you'd have to access two indices
instead of one).

Valentin Davydov.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Michael Black
In reply to this post by Simon Slavin-3
I wrote a C program doing your thing (with random data so each key is
unique)

I see some small knees at 20M and 23M -- but nothing like what you're seeing
as long as I don't do the COMMIT.
Seems the COMMIT is what's causing the sudden slowdown.
When doing the COMMIT I see your dramatic slowdown (an order of magnitude)
at around 5M records...regardless of cache size....so cache size isn't the
problem.
I'm guessing the COMMIT is paging out the index which starts thrashing the
disk.
Increasing the COMMIT to every 100,000 seems to help a lot.  The plot looks
almost like an EKG then with regular slowdowns.


And...when not doing the commit is it normal for memory usage to increase
like the WAL file does?


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include "sqlite3.h"
 
time_t base_seconds;
suseconds_t base_useconds;
 
void tic() {
  struct timeval tv;
  gettimeofday(&tv,NULL);
  base_seconds=tv.tv_sec;
  base_useconds=tv.tv_usec;
}
 
// returns time in seconds since tic() was called
double toc() {
  struct timeval tv;
  gettimeofday(&tv,NULL);
  double mark=(tv.tv_sec-base_seconds)+(tv.tv_usec-base_useconds)/1.0e6;
  return mark;
}
   
void checkrc(sqlite3 *db,int rc,int checkrc,int flag,char *msg,char *str) {
  if (rc != checkrc) {
    fprintf(stderr,msg,str);
    fprintf(stderr,"%s\n",sqlite3_errmsg(db));
    if (flag) { // then fatal
      exit(1);
    }
  }
}  

int main(int argc, char *argv[]) {
  int rc;
  long i;
  char *sql,*errmsg=NULL;
  char *databaseName="data.db";
  sqlite3 *db;
  sqlite3_stmt *stmt1,*stmt2;
  remove(databaseName);
  rc =
sqlite3_open_v2(databaseName,&db,SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE,NU
LL);
  checkrc(db,SQLITE_OK,rc,1,"Error opening database '%s': ",databaseName);
  sql = "create table if not exists t_foo (key binary(16) primary key, value
binary(16))";
  rc=sqlite3_prepare_v2(db,sql,-1,&stmt1,NULL);
  checkrc(db,SQLITE_OK,rc,1,"Error preparing statement '%s': ",sql);
  rc=sqlite3_step(stmt1);
  checkrc(db,SQLITE_DONE,rc,1,"Error executing statement '%s': ",sql);
  rc=sqlite3_step(stmt1);
  checkrc(db,SQLITE_DONE,rc,1,"Error executing statement '%s': ",sql);
  rc=sqlite3_finalize(stmt1);
  checkrc(db,SQLITE_OK,rc,1,"Error finalizing statement '%s': ",sql);
  rc=sqlite3_exec(db, "PRAGMA journal_mode=WAL",NULL,NULL,&errmsg);
  checkrc(db,SQLITE_OK,rc,1,"Error on WAL mode statement '%s': ",sql);
  rc=sqlite3_exec(db, "PRAGMA synchronous=OFF",NULL,NULL,&errmsg);
  checkrc(db,SQLITE_OK,rc,1,"Error on synchronous mode statement '%s':
",sql);
  rc=sqlite3_exec(db, "PRAGMA cache_size=10",NULL,NULL,&errmsg);
  checkrc(db,SQLITE_OK,rc,1,"Error on cache size statement '%s': ",sql);
  sql="BEGIN";
  rc=sqlite3_exec(db,sql,NULL,NULL,&errmsg);
  checkrc(db,SQLITE_OK,rc,1,"Error preparing statement '%s': ",sql);
  sql = "insert or replace into t_foo(key,value) values(?,?)";
  rc=sqlite3_prepare_v2(db,sql,-1,&stmt2,NULL);
  checkrc(db,SQLITE_OK,rc,1,"Error preparing statement '%s': ",sql);
  tic();
  for(i=0; i<50000000; ++i) {
    char key[16],value[16];
    long number = random();
    if (i>0 && (i % 100000) == 0) {
      printf("%ld,%g \n",i,100000/toc());
      tic();
    }
#if 0 // COMMIT?
    if  (i>0&&(i % 1000)==0) { // try 100,000
      sql="COMMIT";
      rc=sqlite3_exec(db,sql,NULL,NULL,&errmsg);
      checkrc(db,SQLITE_OK,rc,1,"Error executing statement '%s': ",errmsg);
      sql="BEGIN";
      rc=sqlite3_exec(db,sql,NULL,NULL,&errmsg);
      checkrc(db,SQLITE_OK,rc,1,"Error executing statement '%s': ",errmsg);
    }
#endif
    memcpy(key,&number,8);
    memcpy(&key[8],&number,8);
    memcpy(value,&i,8);
    rc=sqlite3_bind_blob(stmt2,1,key,16,SQLITE_STATIC);
    checkrc(db,SQLITE_OK,rc,1,"Error bind1 statement '%s': ",sql);
    rc=sqlite3_bind_blob(stmt2,2,value,16,SQLITE_STATIC);
    checkrc(db,SQLITE_OK,rc,1,"Error bind2 statement '%s': ",sql);
    rc=sqlite3_step(stmt2);
    checkrc(db,SQLITE_DONE,rc,1,"Error finalizing statement '%s': ",sql);
    rc=sqlite3_reset(stmt2);
    checkrc(db,SQLITE_OK,rc,1,"Error resetting statement '%s': ",sql);
  }
  return 0;
}


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Saturday, December 29, 2012 8:19 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Write performance question for 3.7.15


On 29 Dec 2012, at 12:37pm, Stephen Chrzanowski <[hidden email]> wrote:

> My guess would be the OS slowing things down with write caching.  The
> system will hold so much data in memory as a cache to write to the disk,
> and when the cache gets full, the OS slows down and waits on the HDD.  Try
> doing a [dd] to a few gig worth of random data and see if you get the same
> kind of slow down.

Makes sense.  A revealing of how much memory the operating system is using
for caching.  Once you hit 30M rows you exceed the amount of memory the
system is using for caching, and it has to start reading or writing disk for
every operation which is far slower.  Or it's the amount of memory that the
operating system is allowing the benchmarking process to use.  Or some other
OS limitation.

But the underlying information in our responses is that it's not a decision
built into SQLite.  There's nothing in SQLite which says we use a fast
strategy for up to 25M rows and then a slower one from then on.

A good way to track it down would be to close the database at the point
where performance starts to tank, and look at how big the filesize is.  That
size should give a clue about what resource the OS is limiting to that size.
Another might be to add an extra unindexed column to the test database and
fill it with a fixed text string in each row.  If this changes the number of
rows before the cliff edge then it's dependent on total filesize.  If it
doesn't, then it's dependent on the size of the index being searched for
each INSERT.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Michael Black
Referencing the C program I sent earlier....I've found a COMMIT every 1M
records does best.  I had an extra zero on my 100,000 which gives the EKG
appearance.
I averaged 25,000 inserts/sec over 50M records with no big knees in the
performance (there is a noticeable knee on the commit though around 12M
records).  But the average performance curve is pretty smooth.
Less than that and you're flushing out the index too often which causes an
awful lot of disk thrashing it would seem.
During the 1M commit the CPU drops to a couple % and the disk I/O is pretty
constant...albeit slow....

P.S. I'm using 3.7.15.1


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Simon Slavin-3

On 29 Dec 2012, at 9:45pm, Michael Black <[hidden email]> wrote:

> During the 1M commit the CPU drops to a couple % and the disk I/O is pretty
> constant...albeit slow....

For the last few years, since multi-core processors have been common on computers, SQLite performance has usually been limited by the performance of storage.  Several times I've recommended to some users that rather than pouring their money and development effort into unintuitive programming (e.g. splitting one TABLE into smaller ones) they just update from spinning disks to SSD.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Dan Frankowski
In reply to this post by Dan Frankowski
I appreciate everyone's thoughts about this.

Knowing larger batch sizes help is interesting. Unfortunately, we don't
always control the batch size. We're using 1000 as an optimistic estimate,
but we receive things and may just have to commit after awhile.

Knowing that more OS file cache or a faster disk helps is also interesting.
Unfortunately, it is non-trivial to switch to SSDs. We will have a whole
fleet of machines, each storing several hundred terabytes. The sqlite
databases are meta-data about that. We might be able to use one SSD just
for the meta-data. We haven't explored that yet. We also can't use lots of
OS disk cache, as it will probably be taken by writing things other than
this meta-data.

Still, all of your observations are useful.

We are comparing to leveldb, which seems to have much better write
performance even in a limited-memory situation. Of course it offers much
less than sqlite. It is a partially-ordered key/value store, rather than a
relational database.

Michael Black writes:

Referencing the C program I sent earlier....I've found a COMMIT every 1M
records does best.  I had an extra zero on my 100,000 which gives the EKG
appearance.
I averaged 25,000 inserts/sec over 50M records with no big knees in the
performance (there is a noticeable knee on the commit though around 12M
records).  But the average performance curve is pretty smooth.
Less than that and you're flushing out the index too often which causes an
awful lot of disk thrashing it would seem.
During the 1M commit the CPU drops to a couple % and the disk I/O is pretty
constant...albeit slow....

P.S. I'm using 3.7.15.1
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Richard Hipp-3
On Tue, Jan 1, 2013 at 12:33 PM, Dan Frankowski <[hidden email]> wrote:

>
> We are comparing to leveldb, which seems to have much better write
> performance even in a limited-memory situation. Of course it offers much
> less than sqlite. It is a partially-ordered key/value store, rather than a
> relational database.
>

The default LSM storage layer for SQLite4 gives much better performance
than LevelDB on average.  Note that most LevelDB inserts are a little
faster than LSM, however, every now and then LevelDB encounters a really,
really slow insert.  SQLite4 LSM avoids these spikes and hence is able to
perform significantly faster in the long run.  SQLite4 LSM also gives you
concurrent access and transactions - capabilities that are missing from
LevelDB.

SQLite4 gives you all the high-level schema and querying capabilities as
SQLite3, with enhancements.

OTOH, SQLite4 is not anything close to being production ready at this time.


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Write performance question for 3.7.15

Dan Frankowski
Ah, interesting. However, yes, we need production-ready. Good luck with
sqlite4 tho.

On Tue, Jan 1, 2013 at 11:43 AM, Richard Hipp <[hidden email]> wrote:

> On Tue, Jan 1, 2013 at 12:33 PM, Dan Frankowski <[hidden email]>
> wrote:
>
> >
> > We are comparing to leveldb, which seems to have much better write
> > performance even in a limited-memory situation. Of course it offers much
> > less than sqlite. It is a partially-ordered key/value store, rather than
> a
> > relational database.
> >
>
> The default LSM storage layer for SQLite4 gives much better performance
> than LevelDB on average.  Note that most LevelDB inserts are a little
> faster than LSM, however, every now and then LevelDB encounters a really,
> really slow insert.  SQLite4 LSM avoids these spikes and hence is able to
> perform significantly faster in the long run.  SQLite4 LSM also gives you
> concurrent access and transactions - capabilities that are missing from
> LevelDB.
>
> SQLite4 gives you all the high-level schema and querying capabilities as
> SQLite3, with enhancements.
>
> OTOH, SQLite4 is not anything close to being production ready at this time.
>
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users