Very, very slow commits

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Very, very slow commits

Rob Willett
Hi,

Background

We've been refactoring our database to reduce the size of it. Through
some simple logic we've managed to pull out 99% of the data to reduce
the size from 51GB down to approx 600MB. This logic has been to remove
rows that are almost the same but not quite identical. As with all
things, the thinking was the difficult bit, the execution somewhat
easier.

As part of the testing we've been doing, we've now hit on an odd and
weird problem to do with the COMMIT statement. A commit of a few hundred
(circa 600-800) rows takes approx 7 seconds whereas before we never even
noticed it, though we now know it was two seconds before. Each row is
probably 1-2K of data, so its not very much at all.

Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent 69MB/sec.
This is not as fast we would like, but it's the same across all our
virtual servers.

4. We've tested the commit on our existing 60GB database and it takes 2
seconds, which is longer than we thought it would be. The server for the
60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04. The
server we are testing on is a 2GB/2 core test server running Ubuntu
16.04. Whilst the test server is smaller, we wouldn't expect it to take
3 times longer to do a commit.

5. The code is identical across the servers. We are running Perl and the
DBI module. The code for doing a commit in Perl::DBI is
  $dbh->do("COMMIT");

   We are getting the expected performance elsewhere on the system and
in the code. It's just the commit that is taking a long time.

6. The code we are committing is adding 600-800 lines to a table that
used to be 200,000,000 rows in size. It's now 400,000 lines in size. We
are wondering if the deletion of the lines has had an impact we didn't
expect. We have vacuumed and analysed the database.

The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
         "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
         "version" integer NOT NULL,
         "Disruption_id" INTEGER NOT NULL,
         "status" integer NOT NULL,
         "severity" integer NOT NULL,
         "levelOfInterest" integer NOT NULL,
         "category" integer NOT NULL,
         "subCategory" integer NOT NULL,
         "startTime" TEXT NOT NULL,
         "endTime" text NOT NULL,
         "location" integer NOT NULL,
         "corridor" integer NOT NULL,
         "comments" integer NOT NULL,
         "currentUpdate" integer NOT NULL,
         "remarkTime" TEXT NOT NULL,
         "lastModTime" TEXT NOT NULL,
         "CauseAreaPointX" real NOT NULL,
         "CauseAreaPointY" real NOT NULL,
         "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC,
"category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE
NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", "Disruption_id",
"Severity", "levelOfInterest", "category", "subCategory", "version");

We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
Update 1

We've copied the disruptions table to a new table, dropped the old
table, copied the new table back in and recreated all the indexes.

Exactly the sam commit performance.

We've also tracked the -shm and -wal files and they are around 5MB in
size.

Mmmm.... tricky (as Deepthought said).

Rob

On 29 Jul 2018, at 9:14, Rob Willett wrote:

> Hi,
>
> Background
>
> We've been refactoring our database to reduce the size of it. Through
> some simple logic we've managed to pull out 99% of the data to reduce
> the size from 51GB down to approx 600MB. This logic has been to remove
> rows that are almost the same but not quite identical. As with all
> things, the thinking was the difficult bit, the execution somewhat
> easier.
>
> As part of the testing we've been doing, we've now hit on an odd and
> weird problem to do with the COMMIT statement. A commit of a few
> hundred (circa 600-800) rows takes approx 7 seconds whereas before we
> never even noticed it, though we now know it was two seconds before.
> Each row is probably 1-2K of data, so its not very much at all.
>
> Details of what we have tried:
>
> 1. We've turned synchronous  on and off
>
> PRAGMA synchronous=ON
>
> and thats not made any difference.
>
> 2. We are using and have been using WAL mode for years.
>
> PRAGMA journal_mode;
> journal_mode
> wal
>
> 3. We've tested that the server copies OK, we get a consistent
> 69MB/sec. This is not as fast we would like, but it's the same across
> all our virtual servers.
>
> 4. We've tested the commit on our existing 60GB database and it takes
> 2 seconds, which is longer than we thought it would be. The server for
> the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
> 14.04. The server we are testing on is a 2GB/2 core test server
> running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
> expect it to take 3 times longer to do a commit.
>
> 5. The code is identical across the servers. We are running Perl and
> the DBI module. The code for doing a commit in Perl::DBI is
>  $dbh->do("COMMIT");
>
>   We are getting the expected performance elsewhere on the system and
> in the code. It's just the commit that is taking a long time.
>
> 6. The code we are committing is adding 600-800 lines to a table that
> used to be 200,000,000 rows in size. It's now 400,000 lines in size.
> We are wondering if the deletion of the lines has had an impact we
> didn't expect. We have vacuumed and analysed the database.
>
> The schema for the table we insert into is
>
> CREATE TABLE IF NOT EXISTS "Disruptions" (
> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> "version" integer NOT NULL,
> "Disruption_id" INTEGER NOT NULL,
> "status" integer NOT NULL,
> "severity" integer NOT NULL,
> "levelOfInterest" integer NOT NULL,
> "category" integer NOT NULL,
> "subCategory" integer NOT NULL,
> "startTime" TEXT NOT NULL,
> "endTime" text NOT NULL,
> "location" integer NOT NULL,
> "corridor" integer NOT NULL,
> "comments" integer NOT NULL,
> "currentUpdate" integer NOT NULL,
> "remarkTime" TEXT NOT NULL,
> "lastModTime" TEXT NOT NULL,
> "CauseAreaPointX" real NOT NULL,
> "CauseAreaPointY" real NOT NULL,
> "Direction" TEXT
> );
> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
> ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
> COLLATE NOCASE ASC);
> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
> "Disruption_id", "Severity", "levelOfInterest", "category",
> "subCategory", "version");
>
> We have checked that this schema is consistent across the databases.
>
> We're about to recreate the table to see if that makes a difference.
>
> Any help or advice welcomed.
>
> Thanks
>
> Rob
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

John Found
What is the performance without the indexes?

On Sun, 29 Jul 2018 10:20:11 +0100
"Rob Willett" <[hidden email]> wrote:

> Update 1
>
> We've copied the disruptions table to a new table, dropped the old
> table, copied the new table back in and recreated all the indexes.
>
> Exactly the sam commit performance.
>
> We've also tracked the -shm and -wal files and they are around 5MB in
> size.
>
> Mmmm.... tricky (as Deepthought said).
>
> Rob
>
> On 29 Jul 2018, at 9:14, Rob Willett wrote:
>
> > Hi,
> >
> > Background
> >
> > We've been refactoring our database to reduce the size of it. Through
> > some simple logic we've managed to pull out 99% of the data to reduce
> > the size from 51GB down to approx 600MB. This logic has been to remove
> > rows that are almost the same but not quite identical. As with all
> > things, the thinking was the difficult bit, the execution somewhat
> > easier.
> >
> > As part of the testing we've been doing, we've now hit on an odd and
> > weird problem to do with the COMMIT statement. A commit of a few
> > hundred (circa 600-800) rows takes approx 7 seconds whereas before we
> > never even noticed it, though we now know it was two seconds before.
> > Each row is probably 1-2K of data, so its not very much at all.
> >
> > Details of what we have tried:
> >
> > 1. We've turned synchronous  on and off
> >
> > PRAGMA synchronous=ON
> >
> > and thats not made any difference.
> >
> > 2. We are using and have been using WAL mode for years.
> >
> > PRAGMA journal_mode;
> > journal_mode
> > wal
> >
> > 3. We've tested that the server copies OK, we get a consistent
> > 69MB/sec. This is not as fast we would like, but it's the same across
> > all our virtual servers.
> >
> > 4. We've tested the commit on our existing 60GB database and it takes
> > 2 seconds, which is longer than we thought it would be. The server for
> > the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
> > 14.04. The server we are testing on is a 2GB/2 core test server
> > running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
> > expect it to take 3 times longer to do a commit.
> >
> > 5. The code is identical across the servers. We are running Perl and
> > the DBI module. The code for doing a commit in Perl::DBI is
> >  $dbh->do("COMMIT");
> >
> >   We are getting the expected performance elsewhere on the system and
> > in the code. It's just the commit that is taking a long time.
> >
> > 6. The code we are committing is adding 600-800 lines to a table that
> > used to be 200,000,000 rows in size. It's now 400,000 lines in size.
> > We are wondering if the deletion of the lines has had an impact we
> > didn't expect. We have vacuumed and analysed the database.
> >
> > The schema for the table we insert into is
> >
> > CREATE TABLE IF NOT EXISTS "Disruptions" (
> > "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> > "version" integer NOT NULL,
> > "Disruption_id" INTEGER NOT NULL,
> > "status" integer NOT NULL,
> > "severity" integer NOT NULL,
> > "levelOfInterest" integer NOT NULL,
> > "category" integer NOT NULL,
> > "subCategory" integer NOT NULL,
> > "startTime" TEXT NOT NULL,
> > "endTime" text NOT NULL,
> > "location" integer NOT NULL,
> > "corridor" integer NOT NULL,
> > "comments" integer NOT NULL,
> > "currentUpdate" integer NOT NULL,
> > "remarkTime" TEXT NOT NULL,
> > "lastModTime" TEXT NOT NULL,
> > "CauseAreaPointX" real NOT NULL,
> > "CauseAreaPointY" real NOT NULL,
> > "Direction" TEXT
> > );
> > CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> > NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
> > ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> > "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> > "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> > CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
> > COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
> > NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> > CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> > COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
> > COLLATE NOCASE ASC);
> > CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
> > "Disruption_id", "Severity", "levelOfInterest", "category",
> > "subCategory", "version");
> >
> > We have checked that this schema is consistent across the databases.
> >
> > We're about to recreate the table to see if that makes a difference.
> >
> > Any help or advice welcomed.
> >
> > Thanks
> >
> > Rob
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


--
John Found <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
John,

Thanks for the prompt reply and a very good question..

We've dropped the indexes and the commit is now very quick, approx two
seconds

However the overall performance of the run is much the same as other
areas of the code are now significantly slower, whereas before they were
quick.

Where were you going with that question?

Thanks

Rob

On 29 Jul 2018, at 10:33, John Found wrote:

> What is the performance without the indexes?
>
> On Sun, 29 Jul 2018 10:20:11 +0100
> "Rob Willett" <[hidden email]> wrote:
>
>> Update 1
>>
>> We've copied the disruptions table to a new table, dropped the old
>> table, copied the new table back in and recreated all the indexes.
>>
>> Exactly the sam commit performance.
>>
>> We've also tracked the -shm and -wal files and they are around 5MB in
>> size.
>>
>> Mmmm.... tricky (as Deepthought said).
>>
>> Rob
>>
>> On 29 Jul 2018, at 9:14, Rob Willett wrote:
>>
>>> Hi,
>>>
>>> Background
>>>
>>> We've been refactoring our database to reduce the size of it.
>>> Through
>>> some simple logic we've managed to pull out 99% of the data to
>>> reduce
>>> the size from 51GB down to approx 600MB. This logic has been to
>>> remove
>>> rows that are almost the same but not quite identical. As with all
>>> things, the thinking was the difficult bit, the execution somewhat
>>> easier.
>>>
>>> As part of the testing we've been doing, we've now hit on an odd and
>>> weird problem to do with the COMMIT statement. A commit of a few
>>> hundred (circa 600-800) rows takes approx 7 seconds whereas before
>>> we
>>> never even noticed it, though we now know it was two seconds before.
>>> Each row is probably 1-2K of data, so its not very much at all.
>>>
>>> Details of what we have tried:
>>>
>>> 1. We've turned synchronous  on and off
>>>
>>> PRAGMA synchronous=ON
>>>
>>> and thats not made any difference.
>>>
>>> 2. We are using and have been using WAL mode for years.
>>>
>>> PRAGMA journal_mode;
>>> journal_mode
>>> wal
>>>
>>> 3. We've tested that the server copies OK, we get a consistent
>>> 69MB/sec. This is not as fast we would like, but it's the same
>>> across
>>> all our virtual servers.
>>>
>>> 4. We've tested the commit on our existing 60GB database and it
>>> takes
>>> 2 seconds, which is longer than we thought it would be. The server
>>> for
>>> the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
>>> 14.04. The server we are testing on is a 2GB/2 core test server
>>> running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
>>> expect it to take 3 times longer to do a commit.
>>>
>>> 5. The code is identical across the servers. We are running Perl and
>>> the DBI module. The code for doing a commit in Perl::DBI is
>>>  $dbh->do("COMMIT");
>>>
>>>   We are getting the expected performance elsewhere on the system
>>> and
>>> in the code. It's just the commit that is taking a long time.
>>>
>>> 6. The code we are committing is adding 600-800 lines to a table
>>> that
>>> used to be 200,000,000 rows in size. It's now 400,000 lines in size.
>>> We are wondering if the deletion of the lines has had an impact we
>>> didn't expect. We have vacuumed and analysed the database.
>>>
>>> The schema for the table we insert into is
>>>
>>> CREATE TABLE IF NOT EXISTS "Disruptions" (
>>> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>>> "version" integer NOT NULL,
>>> "Disruption_id" INTEGER NOT NULL,
>>> "status" integer NOT NULL,
>>> "severity" integer NOT NULL,
>>> "levelOfInterest" integer NOT NULL,
>>> "category" integer NOT NULL,
>>> "subCategory" integer NOT NULL,
>>> "startTime" TEXT NOT NULL,
>>> "endTime" text NOT NULL,
>>> "location" integer NOT NULL,
>>> "corridor" integer NOT NULL,
>>> "comments" integer NOT NULL,
>>> "currentUpdate" integer NOT NULL,
>>> "remarkTime" TEXT NOT NULL,
>>> "lastModTime" TEXT NOT NULL,
>>> "CauseAreaPointX" real NOT NULL,
>>> "CauseAreaPointY" real NOT NULL,
>>> "Direction" TEXT
>>> );
>>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
>>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
>>> ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
>>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
>>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
>>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
>>> ("Disruption_id"
>>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
>>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
>>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>> COLLATE NOCASE ASC);
>>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
>>> "Disruption_id", "Severity", "levelOfInterest", "category",
>>> "subCategory", "version");
>>>
>>> We have checked that this schema is consistent across the databases.
>>>
>>> We're about to recreate the table to see if that makes a difference.
>>>
>>> Any help or advice welcomed.
>>>
>>> Thanks
>>>
>>> Rob
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> --
> John Found <[hidden email]>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Bug: Problem with sqlite3_prepare_v2

Jürgen Palm
In reply to this post by Rob Willett
Hi,

in an application I encountered a problem, where the changing of the
structure of a table by dropping and recreating it via one connection to
a db was not directly seen by another connection to the same db. I could
reproduce the problem with the test program below. Inserting a
sqlite3_step (case 2) changes the output results of the program, which
represents actually the problem I have.

Could you please have a look at it?

SQLite version: 3.24 amalgination
OS: Windows 10
Compiler: Visual Studio Community 2017

Output case 1:

Column count: 1 -> OK
Column count: 2 -> OK
Column count: 1 -> should be 2
Column count: 1 -> should be 2

Output case 2:

Column count: 1 -> OK
Column count: 2 -> OK
Column count: 1 -> should be 2
Column count: 2 -> OK

Test program:

int main()
{
   sqlite3* conn1;
   sqlite3* conn2;

   sqlite3_open("test.db", &conn1);

   // the next two lines are only for preparing the db. They are not
relevant for the problem.
   sqlite3_exec(conn1, "drop table if exists test", NULL, NULL, NULL);
   sqlite3_exec(conn1, "create table test(a text)",NULL,NULL,NULL);

   const char* select = "select * from test";
   sqlite3_stmt* stmt;

   sqlite3_prepare_v2(conn1, select, -1, &stmt, NULL);
   printf("Column count: %d\n", sqlite3_column_count(stmt));
   sqlite3_finalize(stmt);

   sqlite3_open("test.db", &conn2);

   sqlite3_exec(conn2, "drop table test;", NULL, NULL, NULL);
   sqlite3_exec(conn2, "create table test(a text,b text);", NULL, NULL,
NULL);

   sqlite3_prepare_v2(conn2, select, -1, &stmt, NULL);
   printf("Column count: %d\n", sqlite3_column_count(stmt));
   sqlite3_finalize(stmt);

   // closing the second connection can also be done at the end of the
program. The result doesn't change.
   sqlite3_close(conn2);

   sqlite3_prepare_v2(conn1, select, -1, &stmt, NULL);
   printf("Column count: %d\n", sqlite3_column_count(stmt));

   /********** only case 2 *******************/
   sqlite3_step(stmt);
   /******************************************/

   sqlite3_finalize(stmt);

   sqlite3_prepare_v2(conn1, select, -1, &stmt, NULL);
   printf("Column count: %d\n", sqlite3_column_count(stmt));
   sqlite3_finalize(stmt);

   sqlite3_close(conn1);

   return 0;
}

Regards,
Jürgen
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

R Smith-2
In reply to this post by Rob Willett
On 2018/07/29 11:47 AM, Rob Willett wrote:

> John,
>
> Thanks for the prompt reply and a very good question..
>
> We've dropped the indexes and the commit is now very quick, approx two
> seconds
>
> However the overall performance of the run is much the same as other
> areas of the code are now significantly slower, whereas before they
> were quick.
>
> Where were you going with that question?

Possibly you have too many indexes.

An Index is an expensive thing to maintain for a DB, it's only ever
useful when the improvement gained for look-ups significantly outweighs
the cost of updating the Indexes.

The typical process here is to remove all Indexes, then add them back
one by one and run all queries, noting which adds benefit and which not,
then when all are installed, remove them in the same order (i.e. if you
added A then B then C... start removing A then B then C also) and
measure again, you will quickly find the useless Indexes.

This is the very last step in design though, it's the kind of
optimization everyone talks about when they warn against "premature
optimization". A prior step would be to study the queries and see if you
can find better Indexes, or ones that covers (i.e. is helpful with) a
wider range of queries, etc.

Apart from all that... did you add any triggers since the big DB? Which
thing is re-forming the previously "big" sets of data records into the
new streamlined set? If this is a Trigger or UDF, does that not eat any
time?


Cheers,
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

R Smith-2
In reply to this post by Rob Willett
Also, kindly clarify one bit - I'm not sure whether you use the word
"commit" as an easy substitute for the entire process of updating the DB
(i.e all SQL INSERT/UPDATE code that runs up to and including the COMMIT
statement), of if you actually mean the "COMMIT" operation, because the
things that take time are the updating processes, the "COMMIT" function
is simply /typically/ a single quick file operation (depending on
Journal mode of course), and that is typically quick, and if not, might
give a clue towards the problem.



On 2018/07/29 11:47 AM, Rob Willett wrote:

>
>> What is the performance without the indexes?
>>
>> On Sun, 29 Jul 2018 10:20:11 +0100
>> "Rob Willett" <[hidden email]> wrote:
>>
>>> Update 1
>>>
>>> We've copied the disruptions table to a new table, dropped the old
>>> table, copied the new table back in and recreated all the indexes.
>>>
>>> Exactly the sam commit performance.
>>>
>>> We've also tracked the -shm and -wal files and they are around 5MB in
>>> size.
>>>
>>> Mmmm.... tricky (as Deepthought said).
>>>
>>> Rob
>>>
>>> On 29 Jul 2018, at 9:14, Rob Willett wrote:
>>>
>>>> Hi,
>>>>
>>>> Background
>>>>
>>>> We've been refactoring our database to reduce the size of it. Through
>>>> some simple logic we've managed to pull out 99% of the data to reduce
>>>> the size from 51GB down to approx 600MB. This logic has been to remove
>>>> rows that are almost the same but not quite identical. As with all
>>>> things, the thinking was the difficult bit, the execution somewhat
>>>> easier.
>>>>
>>>> As part of the testing we've been doing, we've now hit on an odd and
>>>> weird problem to do with the COMMIT statement. A commit of a few
>>>> hundred (circa 600-800) rows takes approx 7 seconds whereas before we
>>>> never even noticed it, though we now know it was two seconds before.
>>>> Each row is probably 1-2K of data, so its not very much at all.
>>>>
>>>> Details of what we have tried:
>>>>
>>>> 1. We've turned synchronous  on and off
>>>>
>>>> PRAGMA synchronous=ON
>>>>
>>>> and thats not made any difference.
>>>>
>>>> 2. We are using and have been using WAL mode for years.
>>>>
>>>> PRAGMA journal_mode;
>>>> journal_mode
>>>> wal
>>>>
>>>> 3. We've tested that the server copies OK, we get a consistent
>>>> 69MB/sec. This is not as fast we would like, but it's the same across
>>>> all our virtual servers.
>>>>
>>>> 4. We've tested the commit on our existing 60GB database and it takes
>>>> 2 seconds, which is longer than we thought it would be. The server for
>>>> the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
>>>> 14.04. The server we are testing on is a 2GB/2 core test server
>>>> running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
>>>> expect it to take 3 times longer to do a commit.
>>>>
>>>> 5. The code is identical across the servers. We are running Perl and
>>>> the DBI module. The code for doing a commit in Perl::DBI is
>>>>  $dbh->do("COMMIT");
>>>>
>>>>   We are getting the expected performance elsewhere on the system and
>>>> in the code. It's just the commit that is taking a long time.
>>>>
>>>> 6. The code we are committing is adding 600-800 lines to a table that
>>>> used to be 200,000,000 rows in size. It's now 400,000 lines in size.
>>>> We are wondering if the deletion of the lines has had an impact we
>>>> didn't expect. We have vacuumed and analysed the database.
>>>>
>>>> The schema for the table we insert into is
>>>>
>>>> CREATE TABLE IF NOT EXISTS "Disruptions" (
>>>>      "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>>>>      "version" integer NOT NULL,
>>>>      "Disruption_id" INTEGER NOT NULL,
>>>>      "status" integer NOT NULL,
>>>>      "severity" integer NOT NULL,
>>>>      "levelOfInterest" integer NOT NULL,
>>>>      "category" integer NOT NULL,
>>>>      "subCategory" integer NOT NULL,
>>>>      "startTime" TEXT NOT NULL,
>>>>      "endTime" text NOT NULL,
>>>>      "location" integer NOT NULL,
>>>>      "corridor" integer NOT NULL,
>>>>      "comments" integer NOT NULL,
>>>>      "currentUpdate" integer NOT NULL,
>>>>      "remarkTime" TEXT NOT NULL,
>>>>      "lastModTime" TEXT NOT NULL,
>>>>      "CauseAreaPointX" real NOT NULL,
>>>>      "CauseAreaPointY" real NOT NULL,
>>>>      "Direction" TEXT
>>>> );
>>>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
>>>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
>>>> ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
>>>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
>>>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
>>>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
>>>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
>>>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
>>>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>>> COLLATE NOCASE ASC);
>>>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
>>>> "Disruption_id", "Severity", "levelOfInterest", "category",
>>>> "subCategory", "version");
>>>>
>>>> We have checked that this schema is consistent across the databases.
>>>>
>>>> We're about to recreate the table to see if that makes a difference.
>>>>
>>>> Any help or advice welcomed.
>>>>
>>>> Thanks
>>>>
>>>> Rob
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> --
>> John Found <[hidden email]>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

J. King-3
In reply to this post by Rob Willett
On July 29, 2018 5:47:29 AM EDT, Rob Willett <[hidden email]> wrote:

>John,
>
>Thanks for the prompt reply and a very good question..
>
>We've dropped the indexes and the commit is now very quick, approx two
>seconds
>
>However the overall performance of the run is much the same as other
>areas of the code are now significantly slower, whereas before they
>were
>quick.
>
>Where were you going with that question?
>
Might ANALYZE help?
--
J. King
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Djelf
In reply to this post by Rob Willett
Rob,

Try creating indexes without order.

Or, try to make a column with a hash of the values entering the index and
search for the value by the index of. This will complicate the logic of your
program, but it will decrease the volume of the database, and possibly
significantly speed up both reading and writing.

---
Anton Azanov



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
Anton,

Dropped the indexes and created them without order.

We'll need to look at what your second para means. It could be a major
and massive change.

Rob



On 29 Jul 2018, at 11:52, Djelf wrote:

> Rob,
>
> Try creating indexes without order.
>
> Or, try to make a column with a hash of the values entering the index
> and
> search for the value by the index of. This will complicate the logic
> of your
> program, but it will decrease the volume of the database, and possibly
> significantly speed up both reading and writing.
>
> ---
> Anton Azanov
>
>
>
> --
> Sent from: http://sqlite.1065341.n5.nabble.com/
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
In reply to this post by J. King-3
Thanks for the mail.

We ran analyze with no indexes, made no difference.

We recreated the indexes and ran analyze again. The very long commit is
back, this time it took 14 secs :)

It does appear that the indexes have something to do with this whereby
they have not been an issue to now.

Rob

On 29 Jul 2018, at 11:45, J. King wrote:

> On July 29, 2018 5:47:29 AM EDT, Rob Willett
> <[hidden email]> wrote:
>> John,
>>
>> Thanks for the prompt reply and a very good question..
>>
>> We've dropped the indexes and the commit is now very quick, approx
>> two
>> seconds
>>
>> However the overall performance of the run is much the same as other
>> areas of the code are now significantly slower, whereas before they
>> were
>> quick.
>>
>> Where were you going with that question?
>>
> Might ANALYZE help?
> --
> J. King
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Simon Slavin-3
In reply to this post by Rob Willett
Please try moving your COLLATE clauses into the table definition.  e.g. instead of

    CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE NOCASE ASC);

Your table definition should have

         "version" integer NOT NULL COLLATE NOCASE,
         "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
...
         "location" integer NOT NULL COLLATE NOCASE,

and the index should be

    CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
        ("version" ASC, "Disruption_id" ASC, "location" ASC);

Once data has been entered, do ANALYZE.  This step may take a long time.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Problem with sqlite3_prepare_v2

Richard Hipp-3
In reply to this post by Jürgen Palm
On 7/29/18, Jürgen Palm <[hidden email]> wrote:
> Hi,
>
> in an application I encountered a problem, where the changing of the
> structure of a table by dropping and recreating it via one connection to
> a db was not directly seen by another connection to the same db.

Detecting a schema change requires starting a read transaction.  The
sqlite3_step() interface (usually) starts a read transaction, and so
it will detect the schema change, and will cause the statement to
automatically adjust to the new schema.  But sqlite3_column_count()
does *not* start a new transaction.  The sqlite3_column_count() bases
its answer on whatever schema was current during last transaction for
that connection.


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Problem with sqlite3_prepare_v2

Jürgen Palm-2
Richard Hipp wrote:

> On 7/29/18, Jürgen Palm <[hidden email]> wrote:
>> Hi,
>>
>> in an application I encountered a problem, where the changing of the
>> structure of a table by dropping and recreating it via one connection to
>> a db was not directly seen by another connection to the same db.
> Detecting a schema change requires starting a read transaction.  The
> sqlite3_step() interface (usually) starts a read transaction, and so
> it will detect the schema change, and will cause the statement to
> automatically adjust to the new schema.  But sqlite3_column_count()
> does *not* start a new transaction.  The sqlite3_column_count() bases
> its answer on whatever schema was current during last transaction for
> that connection.
>
>
Thank you for the explanation. Although the behavior appears still a
little bit wrong to me, I can live with it and I understand, that it
might cause performance issues for bulk operations for example, if for
each statement  a schema change needs to be checked, although it's not
necessary.

As a quick, ugly workaround for my situation I could add something like
   sqlite3_exec(conn1, "SELECT 1",NULL,NULL,NULL);
before the sqlite3_prepare_v2 after the schema change.

Is there any better way? Or would it be possible to add a SQLITE_PREPARE
flag to sqlite3_prepare_v3, which forces a schema change check?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

droedel
In reply to this post by Rob Willett
Hi,

When having bad performance, I usually first try to find out if the slowness is due to disk througput (sequential), slow random access or something else. In Linux, try "iostat -xtc 5". Do this with and without your application writing to disk.

If you see high CPU %iowait and high %util on your disk, then disk is the bottleneck. If not: start profiling / analyzing other bottlenecks (CPU / network / ...)

If the disk throughput (wMB/s) is close to your normal sequential throughput (69 MB/s): try to write less data or get a faster disk.
If the disk troughput is low, but high numbers of writes (w/s): there's too much seeking / too many small writes to your disk. Page cache too small ? Checkpointing too often ?

Sometimes this kind of problems is caused by other applications (logging / ...) causing too much baseload. %util should be low when your application isn't running.

Just my 2 cents.

Kind regards,

Droedel


On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:

> Hi,
>
> Background
>
> We've been refactoring our database to reduce the size of it. Through
> some simple logic we've managed to pull out 99% of the data to reduce
> the size from 51GB down to approx 600MB. This logic has been to remove
> rows that are almost the same but not quite identical. As with all
> things, the thinking was the difficult bit, the execution somewhat
> easier.
>
> As part of the testing we've been doing, we've now hit on an odd and
> weird problem to do with the COMMIT statement. A commit of a few hundred
> (circa 600-800) rows takes approx 7 seconds whereas before we never even
> noticed it, though we now know it was two seconds before. Each row is
> probably 1-2K of data, so its not very much at all.
>
> Details of what we have tried:
>
> 1. We've turned synchronous  on and off
>
> PRAGMA synchronous=ON
>
> and thats not made any difference.
>
> 2. We are using and have been using WAL mode for years.
>
> PRAGMA journal_mode;
> journal_mode
> wal
>
> 3. We've tested that the server copies OK, we get a consistent 69MB/sec.
> This is not as fast we would like, but it's the same across all our
> virtual servers.
>
> 4. We've tested the commit on our existing 60GB database and it takes 2
> seconds, which is longer than we thought it would be. The server for the
> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04. The
> server we are testing on is a 2GB/2 core test server running Ubuntu
> 16.04. Whilst the test server is smaller, we wouldn't expect it to take
> 3 times longer to do a commit.
>
> 5. The code is identical across the servers. We are running Perl and the
> DBI module. The code for doing a commit in Perl::DBI is
>   $dbh->do("COMMIT");
>
>    We are getting the expected performance elsewhere on the system and
> in the code. It's just the commit that is taking a long time.
>
> 6. The code we are committing is adding 600-800 lines to a table that
> used to be 200,000,000 rows in size. It's now 400,000 lines in size. We
> are wondering if the deletion of the lines has had an impact we didn't
> expect. We have vacuumed and analysed the database.
>
> The schema for the table we insert into is
>
> CREATE TABLE IF NOT EXISTS "Disruptions" (
> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> "version" integer NOT NULL,
> "Disruption_id" INTEGER NOT NULL,
> "status" integer NOT NULL,
> "severity" integer NOT NULL,
> "levelOfInterest" integer NOT NULL,
> "category" integer NOT NULL,
> "subCategory" integer NOT NULL,
> "startTime" TEXT NOT NULL,
> "endTime" text NOT NULL,
> "location" integer NOT NULL,
> "corridor" integer NOT NULL,
> "comments" integer NOT NULL,
> "currentUpdate" integer NOT NULL,
> "remarkTime" TEXT NOT NULL,
> "lastModTime" TEXT NOT NULL,
> "CauseAreaPointX" real NOT NULL,
> "CauseAreaPointY" real NOT NULL,
> "Direction" TEXT
> );
> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC,
> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE
> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
> NOCASE ASC);
> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", "Disruption_id",
> "Severity", "levelOfInterest", "category", "subCategory", "version");
>
> We have checked that this schema is consistent across the databases.
>
> We're about to recreate the table to see if that makes a difference.
>
> Any help or advice welcomed.
>
> Thanks
>
> Rob
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
Droedel,

Thanks for the comprehensive reply. We have actually done all of this.

The system has been running for 2-3 years and we have taken the
opportunity to try and prune the database from 60GB down to 600MB.
Currently the live system is working OK with a 60GB database, but our
test system (which is smaller) is struggling with 600MB.

The system has a restriction of IOPS as it's a Virtual Private Server.
Technically it's running Ubuntu 16.04 under OpenVZ. We can get 69MB/sec
with a disk to disk copy, which isn't brilliant if we had sustained disk
traffic which we don't.

We log approx 600 - 800 items of around 3-5K every five minutes. These
600-800 items are mainly an insert into a single table, there are other
things happening as well, but this is the bulk of the work. We can see
that the -wal files grow a small amount (4-5MB) just before the commit.
It then takes 7 seconds to execute the commit. This is the bit that
we're struggling with. We know we can get circa 70MB/sec data
throughput, so this should take a fraction of a second. Now SQLite needs
to work out which pages to commit so thats a little longer, but we know
SQLite is fast, so that shouldn't take 7 seconds on the small database
as it doesn't take that long on the large 60GB database. Thats the
puzzling bit, the large database is quick, the small one slow.

We have no logging turned on, we can turn SQL logging on at the DBI
level but that turns a 20 sec run into a 2-3 minute run as it captures
everything :) Nothing in the log files gives us any concern (apart from
the really long commit time). Simon Slavin suggested dropping the
indexes which we did, that turned the commit into a fast commit, so its
something to do with the indexes but we can't see what.

What we are now doing is going back to the very beginning:

1. We built a replacement system yesterday with 8GB memory and 8 cores
and 150GB disk space. Its virtualised (ESXI) but under our control.
2. We've installed a copy of the old 60GB database on the new system.
3. We're going to benchmark the new system over a couple of thousand
runs to see what the average time is.
4. We'll then work our way through the deduping of the database step by
step to see when the commit time blow up. This will take a few days as
working out the duplications of 200,000,000 rows isn't that quick :) As
we found out, dropping a very large table is really, really, really
slow.
5. We'll apply some of the ideas that people have suggested since
yesterday to see if they work, but I'm keen that we have a repeatable
problem that we solve rather than we use a scatter gun approach to
fixing it. We think SQLite is well written so we figure the problem is
ours to solve rather than simply blaming the software.


Thanks

Rob

On 30 Jul 2018, at 11:11, Droedel wrote:

> Hi,
>
> When having bad performance, I usually first try to find out if the
> slowness is due to disk througput (sequential), slow random access or
> something else. In Linux, try "iostat -xtc 5". Do this with and
> without your application writing to disk.
>
> If you see high CPU %iowait and high %util on your disk, then disk is
> the bottleneck. If not: start profiling / analyzing other bottlenecks
> (CPU / network / ...)
>
> If the disk throughput (wMB/s) is close to your normal sequential
> throughput (69 MB/s): try to write less data or get a faster disk.
> If the disk troughput is low, but high numbers of writes (w/s):
> there's too much seeking / too many small writes to your disk. Page
> cache too small ? Checkpointing too often ?
>
> Sometimes this kind of problems is caused by other applications
> (logging / ...) causing too much baseload. %util should be low when
> your application isn't running.
>
> Just my 2 cents.
>
> Kind regards,
>
> Droedel
>
>
> On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:
>> Hi,
>>
>> Background
>>
>> We've been refactoring our database to reduce the size of it. Through
>> some simple logic we've managed to pull out 99% of the data to reduce
>> the size from 51GB down to approx 600MB. This logic has been to
>> remove
>> rows that are almost the same but not quite identical. As with all
>> things, the thinking was the difficult bit, the execution somewhat
>> easier.
>>
>> As part of the testing we've been doing, we've now hit on an odd and
>> weird problem to do with the COMMIT statement. A commit of a few
>> hundred
>> (circa 600-800) rows takes approx 7 seconds whereas before we never
>> even
>> noticed it, though we now know it was two seconds before. Each row is
>> probably 1-2K of data, so its not very much at all.
>>
>> Details of what we have tried:
>>
>> 1. We've turned synchronous  on and off
>>
>> PRAGMA synchronous=ON
>>
>> and thats not made any difference.
>>
>> 2. We are using and have been using WAL mode for years.
>>
>> PRAGMA journal_mode;
>> journal_mode
>> wal
>>
>> 3. We've tested that the server copies OK, we get a consistent
>> 69MB/sec.
>> This is not as fast we would like, but it's the same across all our
>> virtual servers.
>>
>> 4. We've tested the commit on our existing 60GB database and it takes
>> 2
>> seconds, which is longer than we thought it would be. The server for
>> the
>> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04.
>> The
>> server we are testing on is a 2GB/2 core test server running Ubuntu
>> 16.04. Whilst the test server is smaller, we wouldn't expect it to
>> take
>> 3 times longer to do a commit.
>>
>> 5. The code is identical across the servers. We are running Perl and
>> the
>> DBI module. The code for doing a commit in Perl::DBI is
>>   $dbh->do("COMMIT");
>>
>>    We are getting the expected performance elsewhere on the system
>> and
>> in the code. It's just the commit that is taking a long time.
>>
>> 6. The code we are committing is adding 600-800 lines to a table that
>> used to be 200,000,000 rows in size. It's now 400,000 lines in size.
>> We
>> are wondering if the deletion of the lines has had an impact we
>> didn't
>> expect. We have vacuumed and analysed the database.
>>
>> The schema for the table we insert into is
>>
>> CREATE TABLE IF NOT EXISTS "Disruptions" (
>> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>> "version" integer NOT NULL,
>> "Disruption_id" INTEGER NOT NULL,
>> "status" integer NOT NULL,
>> "severity" integer NOT NULL,
>> "levelOfInterest" integer NOT NULL,
>> "category" integer NOT NULL,
>> "subCategory" integer NOT NULL,
>> "startTime" TEXT NOT NULL,
>> "endTime" text NOT NULL,
>> "location" integer NOT NULL,
>> "corridor" integer NOT NULL,
>> "comments" integer NOT NULL,
>> "currentUpdate" integer NOT NULL,
>> "remarkTime" TEXT NOT NULL,
>> "lastModTime" TEXT NOT NULL,
>> "CauseAreaPointX" real NOT NULL,
>> "CauseAreaPointY" real NOT NULL,
>> "Direction" TEXT
>> );
>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
>> ASC,
>> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
>> ("Disruption_id"
>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>> COLLATE
>> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
>> NOCASE ASC);
>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
>> "Disruption_id",
>> "Severity", "levelOfInterest", "category", "subCategory", "version");
>>
>> We have checked that this schema is consistent across the databases.
>>
>> We're about to recreate the table to see if that makes a difference.
>>
>> Any help or advice welcomed.
>>
>> Thanks
>>
>> Rob
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Warren Young
On Jul 30, 2018, at 4:51 AM, Rob Willett <[hidden email]> wrote:
>
> The system has been running for 2-3 years

Has performance changed over that span?  Which direction?

> we have taken the opportunity to try and prune the database from 60GB down to 600MB.

SQLite’s speed is only weakly affected by database size.  A starting guess is log2(N) where N is file size.

Since your data access didn’t get 10x faster from your 100x size drop, you can fairly guess that the speed problem isn’t due to the inherent time required to traverse tree-based data structures in SQLite.

> The system has a restriction of IOPS as it's a Virtual Private Server. Technically it's running Ubuntu 16.04 under OpenVZ. We can get 69MB/sec with a disk to disk copy, which isn't brilliant if we had sustained disk traffic which we don’t.

I am sensing a spinning disk.  (Else, throughput should be a lot higher.)

SQLite takes data durability (the D in ACID) seriously, unlike most other software applications, so it is uncommon in that it flushes each transaction to disk before proceeding with further writes to that table.

A flush to disk takes a full disk rotation, and due to the way SQLite’s journal operates, each transaction requires two flushes.  That means that with a 7200 RPM disk, you can get a maximum of 60 transactions per second per table with SQLite.

Sound familiar?

If I’ve guessed the problem correctly, the solutions are:

1. Batch multiple writes in a transaction.

2. Switch to an SSD.

3. Use multiple tables and/or multiple DB files.  In your case, I’d suggest one SQLite DB per sensor, with one thread per sensor, each of which keeps one of the SQLite DBs open continuously.  That way, a blocked DB conn won’t block any other writers.

Those solutions are given in order of ease of application and cost of implementation.

> Nothing in the log files gives us any concern

Have you tried SQLite’s new .expert feature?

   https://sqlite.org/cli.html#index_recommendations_sqlite_expert_

> dropping a very large table is really, really, really slow.

If you put your main data table in a file of its own, you can quickly “drop” the table by just closing the DB and removing the DB file from disk.  

When you then recreate the DB file with a fresh schema, it’s effectively defragged/vacuumed as well.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

droedel
In reply to this post by Rob Willett
Hi Rob,

Is there significant read access (iostat: r/s) during these slow writes ? If yes, it might be due to a small cache, requiring the database to read (index) pages before updating them.

And is the data you're adding in both databases (large/small) added at the end of the table using the autoincrement, or do you insert some items in the middle ? I'm not a SQLite performance expert, but in other databases performance can be very different because in the former case fewer pages must be updated.

Microsoft SQL Server has something called "fill factor", basically the max percentage of an index page that is used during initial fill, which helps avoiding too many page shuffling in the index when extra items are added. Disadvantage: it makes DBAs argue endlessly about the best fill factor ;-) Maybe there's something similar possible in SQLite but I couldn't find a pragma for this.

Oh, and do both databases have the same page size, preferably 4K ?

Regards,

Droedel



On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:

> Droedel,
>
> Thanks for the comprehensive reply. We have actually done all of this.
>
> The system has been running for 2-3 years and we have taken the
> opportunity to try and prune the database from 60GB down to 600MB.
> Currently the live system is working OK with a 60GB database, but our
> test system (which is smaller) is struggling with 600MB.
>
> The system has a restriction of IOPS as it's a Virtual Private Server.
> Technically it's running Ubuntu 16.04 under OpenVZ. We can get 69MB/sec
> with a disk to disk copy, which isn't brilliant if we had sustained disk
> traffic which we don't.
>
> We log approx 600 - 800 items of around 3-5K every five minutes. These
> 600-800 items are mainly an insert into a single table, there are other
> things happening as well, but this is the bulk of the work. We can see
> that the -wal files grow a small amount (4-5MB) just before the commit.
> It then takes 7 seconds to execute the commit. This is the bit that
> we're struggling with. We know we can get circa 70MB/sec data
> throughput, so this should take a fraction of a second. Now SQLite needs
> to work out which pages to commit so thats a little longer, but we know
> SQLite is fast, so that shouldn't take 7 seconds on the small database
> as it doesn't take that long on the large 60GB database. Thats the
> puzzling bit, the large database is quick, the small one slow.
>
> We have no logging turned on, we can turn SQL logging on at the DBI
> level but that turns a 20 sec run into a 2-3 minute run as it captures
> everything :) Nothing in the log files gives us any concern (apart from
> the really long commit time). Simon Slavin suggested dropping the
> indexes which we did, that turned the commit into a fast commit, so its
> something to do with the indexes but we can't see what.
>
> What we are now doing is going back to the very beginning:
>
> 1. We built a replacement system yesterday with 8GB memory and 8 cores
> and 150GB disk space. Its virtualised (ESXI) but under our control.
> 2. We've installed a copy of the old 60GB database on the new system.
> 3. We're going to benchmark the new system over a couple of thousand
> runs to see what the average time is.
> 4. We'll then work our way through the deduping of the database step by
> step to see when the commit time blow up. This will take a few days as
> working out the duplications of 200,000,000 rows isn't that quick :) As
> we found out, dropping a very large table is really, really, really
> slow.
> 5. We'll apply some of the ideas that people have suggested since
> yesterday to see if they work, but I'm keen that we have a repeatable
> problem that we solve rather than we use a scatter gun approach to
> fixing it. We think SQLite is well written so we figure the problem is
> ours to solve rather than simply blaming the software.
>
>
> Thanks
>
> Rob
>
> On 30 Jul 2018, at 11:11, Droedel wrote:
>
> > Hi,
> >
> > When having bad performance, I usually first try to find out if the
> > slowness is due to disk througput (sequential), slow random access or
> > something else. In Linux, try "iostat -xtc 5". Do this with and
> > without your application writing to disk.
> >
> > If you see high CPU %iowait and high %util on your disk, then disk is
> > the bottleneck. If not: start profiling / analyzing other bottlenecks
> > (CPU / network / ...)
> >
> > If the disk throughput (wMB/s) is close to your normal sequential
> > throughput (69 MB/s): try to write less data or get a faster disk.
> > If the disk troughput is low, but high numbers of writes (w/s):
> > there's too much seeking / too many small writes to your disk. Page
> > cache too small ? Checkpointing too often ?
> >
> > Sometimes this kind of problems is caused by other applications
> > (logging / ...) causing too much baseload. %util should be low when
> > your application isn't running.
> >
> > Just my 2 cents.
> >
> > Kind regards,
> >
> > Droedel
> >
> >
> > On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:
> >> Hi,
> >>
> >> Background
> >>
> >> We've been refactoring our database to reduce the size of it. Through
> >> some simple logic we've managed to pull out 99% of the data to reduce
> >> the size from 51GB down to approx 600MB. This logic has been to
> >> remove
> >> rows that are almost the same but not quite identical. As with all
> >> things, the thinking was the difficult bit, the execution somewhat
> >> easier.
> >>
> >> As part of the testing we've been doing, we've now hit on an odd and
> >> weird problem to do with the COMMIT statement. A commit of a few
> >> hundred
> >> (circa 600-800) rows takes approx 7 seconds whereas before we never
> >> even
> >> noticed it, though we now know it was two seconds before. Each row is
> >> probably 1-2K of data, so its not very much at all.
> >>
> >> Details of what we have tried:
> >>
> >> 1. We've turned synchronous  on and off
> >>
> >> PRAGMA synchronous=ON
> >>
> >> and thats not made any difference.
> >>
> >> 2. We are using and have been using WAL mode for years.
> >>
> >> PRAGMA journal_mode;
> >> journal_mode
> >> wal
> >>
> >> 3. We've tested that the server copies OK, we get a consistent
> >> 69MB/sec.
> >> This is not as fast we would like, but it's the same across all our
> >> virtual servers.
> >>
> >> 4. We've tested the commit on our existing 60GB database and it takes
> >> 2
> >> seconds, which is longer than we thought it would be. The server for
> >> the
> >> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04.
> >> The
> >> server we are testing on is a 2GB/2 core test server running Ubuntu
> >> 16.04. Whilst the test server is smaller, we wouldn't expect it to
> >> take
> >> 3 times longer to do a commit.
> >>
> >> 5. The code is identical across the servers. We are running Perl and
> >> the
> >> DBI module. The code for doing a commit in Perl::DBI is
> >>   $dbh->do("COMMIT");
> >>
> >>    We are getting the expected performance elsewhere on the system
> >> and
> >> in the code. It's just the commit that is taking a long time.
> >>
> >> 6. The code we are committing is adding 600-800 lines to a table that
> >> used to be 200,000,000 rows in size. It's now 400,000 lines in size.
> >> We
> >> are wondering if the deletion of the lines has had an impact we
> >> didn't
> >> expect. We have vacuumed and analysed the database.
> >>
> >> The schema for the table we insert into is
> >>
> >> CREATE TABLE IF NOT EXISTS "Disruptions" (
> >> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> >> "version" integer NOT NULL,
> >> "Disruption_id" INTEGER NOT NULL,
> >> "status" integer NOT NULL,
> >> "severity" integer NOT NULL,
> >> "levelOfInterest" integer NOT NULL,
> >> "category" integer NOT NULL,
> >> "subCategory" integer NOT NULL,
> >> "startTime" TEXT NOT NULL,
> >> "endTime" text NOT NULL,
> >> "location" integer NOT NULL,
> >> "corridor" integer NOT NULL,
> >> "comments" integer NOT NULL,
> >> "currentUpdate" integer NOT NULL,
> >> "remarkTime" TEXT NOT NULL,
> >> "lastModTime" TEXT NOT NULL,
> >> "CauseAreaPointX" real NOT NULL,
> >> "CauseAreaPointY" real NOT NULL,
> >> "Direction" TEXT
> >> );
> >> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> >> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
> >> ASC,
> >> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> >> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> >> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> >> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
> >> ("Disruption_id"
> >> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
> >> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> >> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> >> COLLATE
> >> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
> >> NOCASE ASC);
> >> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
> >> "Disruption_id",
> >> "Severity", "levelOfInterest", "category", "subCategory", "version");
> >>
> >> We have checked that this schema is consistent across the databases.
> >>
> >> We're about to recreate the table to see if that makes a difference.
> >>
> >> Any help or advice welcomed.
> >>
> >> Thanks
> >>
> >> Rob
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
Droedel,

We don't think there are significant read access. The database is a
single database on a single thread on a single process. The only access
to it is a Perl script that logs the incoming information. We never have
two accesses at the same time.

We have a nagging feeling (and thats all it is) about the autoincrement
value. We do use that feature in the table, but we have deleted so many
rows and we may have messed something up as we move data from table to
table. We did read https://www.sqlite.org/autoinc.html a week or so ago
and we wonder if the fact we have deleted circa 199,500,000 rows from
the table of 200,000,000 rows we may have cocked it up somehow. We have
never directly accessed the sqite_internal table (except to read) but
this page makes us wonder if we have missed an error somewhere.

We did wonder if we are filling up pages in the middle or something.
However we expected the vacuum and analyse to sort this out. Now its
entirely possible we have had this problem before now hence we are going
to go back to the very beginning with out 61GB database and take it step
by step so we understand what is happening. Whats frustrating is that
the 61GB database seems fast and the 600MB seems slow yet they both have
the same database schema. We feel that we have made an error somewhere
but only now discovered it.

We've built the replica test system now and we're going to have some
initial checks and get some benchmarks in place.

It could be an interesting and exciting ride :)

Rob


On 30 Jul 2018, at 12:32, Droedel wrote:

> Hi Rob,
>
> Is there significant read access (iostat: r/s) during these slow
> writes ? If yes, it might be due to a small cache, requiring the
> database to read (index) pages before updating them.
>
> And is the data you're adding in both databases (large/small) added at
> the end of the table using the autoincrement, or do you insert some
> items in the middle ? I'm not a SQLite performance expert, but in
> other databases performance can be very different because in the
> former case fewer pages must be updated.
>
> Microsoft SQL Server has something called "fill factor", basically the
> max percentage of an index page that is used during initial fill,
> which helps avoiding too many page shuffling in the index when extra
> items are added. Disadvantage: it makes DBAs argue endlessly about the
> best fill factor ;-) Maybe there's something similar possible in
> SQLite but I couldn't find a pragma for this.
>
> Oh, and do both databases have the same page size, preferably 4K ?
>
> Regards,
>
> Droedel
>
>
>
> On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:
>> Droedel,
>>
>> Thanks for the comprehensive reply. We have actually done all of
>> this.
>>
>> The system has been running for 2-3 years and we have taken the
>> opportunity to try and prune the database from 60GB down to 600MB.
>> Currently the live system is working OK with a 60GB database, but our
>> test system (which is smaller) is struggling with 600MB.
>>
>> The system has a restriction of IOPS as it's a Virtual Private
>> Server.
>> Technically it's running Ubuntu 16.04 under OpenVZ. We can get
>> 69MB/sec
>> with a disk to disk copy, which isn't brilliant if we had sustained
>> disk
>> traffic which we don't.
>>
>> We log approx 600 - 800 items of around 3-5K every five minutes.
>> These
>> 600-800 items are mainly an insert into a single table, there are
>> other
>> things happening as well, but this is the bulk of the work. We can
>> see
>> that the -wal files grow a small amount (4-5MB) just before the
>> commit.
>> It then takes 7 seconds to execute the commit. This is the bit that
>> we're struggling with. We know we can get circa 70MB/sec data
>> throughput, so this should take a fraction of a second. Now SQLite
>> needs
>> to work out which pages to commit so thats a little longer, but we
>> know
>> SQLite is fast, so that shouldn't take 7 seconds on the small
>> database
>> as it doesn't take that long on the large 60GB database. Thats the
>> puzzling bit, the large database is quick, the small one slow.
>>
>> We have no logging turned on, we can turn SQL logging on at the DBI
>> level but that turns a 20 sec run into a 2-3 minute run as it
>> captures
>> everything :) Nothing in the log files gives us any concern (apart
>> from
>> the really long commit time). Simon Slavin suggested dropping the
>> indexes which we did, that turned the commit into a fast commit, so
>> its
>> something to do with the indexes but we can't see what.
>>
>> What we are now doing is going back to the very beginning:
>>
>> 1. We built a replacement system yesterday with 8GB memory and 8
>> cores
>> and 150GB disk space. Its virtualised (ESXI) but under our control.
>> 2. We've installed a copy of the old 60GB database on the new system.
>> 3. We're going to benchmark the new system over a couple of thousand
>> runs to see what the average time is.
>> 4. We'll then work our way through the deduping of the database step
>> by
>> step to see when the commit time blow up. This will take a few days
>> as
>> working out the duplications of 200,000,000 rows isn't that quick :)
>> As
>> we found out, dropping a very large table is really, really, really
>> slow.
>> 5. We'll apply some of the ideas that people have suggested since
>> yesterday to see if they work, but I'm keen that we have a repeatable
>> problem that we solve rather than we use a scatter gun approach to
>> fixing it. We think SQLite is well written so we figure the problem
>> is
>> ours to solve rather than simply blaming the software.
>>
>>
>> Thanks
>>
>> Rob
>>
>> On 30 Jul 2018, at 11:11, Droedel wrote:
>>
>>> Hi,
>>>
>>> When having bad performance, I usually first try to find out if the
>>> slowness is due to disk througput (sequential), slow random access
>>> or
>>> something else. In Linux, try "iostat -xtc 5". Do this with and
>>> without your application writing to disk.
>>>
>>> If you see high CPU %iowait and high %util on your disk, then disk
>>> is
>>> the bottleneck. If not: start profiling / analyzing other
>>> bottlenecks
>>> (CPU / network / ...)
>>>
>>> If the disk throughput (wMB/s) is close to your normal sequential
>>> throughput (69 MB/s): try to write less data or get a faster disk.
>>> If the disk troughput is low, but high numbers of writes (w/s):
>>> there's too much seeking / too many small writes to your disk. Page
>>> cache too small ? Checkpointing too often ?
>>>
>>> Sometimes this kind of problems is caused by other applications
>>> (logging / ...) causing too much baseload. %util should be low when
>>> your application isn't running.
>>>
>>> Just my 2 cents.
>>>
>>> Kind regards,
>>>
>>> Droedel
>>>
>>>
>>> On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:
>>>> Hi,
>>>>
>>>> Background
>>>>
>>>> We've been refactoring our database to reduce the size of it.
>>>> Through
>>>> some simple logic we've managed to pull out 99% of the data to
>>>> reduce
>>>> the size from 51GB down to approx 600MB. This logic has been to
>>>> remove
>>>> rows that are almost the same but not quite identical. As with all
>>>> things, the thinking was the difficult bit, the execution somewhat
>>>> easier.
>>>>
>>>> As part of the testing we've been doing, we've now hit on an odd
>>>> and
>>>> weird problem to do with the COMMIT statement. A commit of a few
>>>> hundred
>>>> (circa 600-800) rows takes approx 7 seconds whereas before we never
>>>> even
>>>> noticed it, though we now know it was two seconds before. Each row
>>>> is
>>>> probably 1-2K of data, so its not very much at all.
>>>>
>>>> Details of what we have tried:
>>>>
>>>> 1. We've turned synchronous  on and off
>>>>
>>>> PRAGMA synchronous=ON
>>>>
>>>> and thats not made any difference.
>>>>
>>>> 2. We are using and have been using WAL mode for years.
>>>>
>>>> PRAGMA journal_mode;
>>>> journal_mode
>>>> wal
>>>>
>>>> 3. We've tested that the server copies OK, we get a consistent
>>>> 69MB/sec.
>>>> This is not as fast we would like, but it's the same across all our
>>>> virtual servers.
>>>>
>>>> 4. We've tested the commit on our existing 60GB database and it
>>>> takes
>>>> 2
>>>> seconds, which is longer than we thought it would be. The server
>>>> for
>>>> the
>>>> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
>>>> 14.04.
>>>> The
>>>> server we are testing on is a 2GB/2 core test server running Ubuntu
>>>> 16.04. Whilst the test server is smaller, we wouldn't expect it to
>>>> take
>>>> 3 times longer to do a commit.
>>>>
>>>> 5. The code is identical across the servers. We are running Perl
>>>> and
>>>> the
>>>> DBI module. The code for doing a commit in Perl::DBI is
>>>>   $dbh->do("COMMIT");
>>>>
>>>>    We are getting the expected performance elsewhere on the system
>>>> and
>>>> in the code. It's just the commit that is taking a long time.
>>>>
>>>> 6. The code we are committing is adding 600-800 lines to a table
>>>> that
>>>> used to be 200,000,000 rows in size. It's now 400,000 lines in
>>>> size.
>>>> We
>>>> are wondering if the deletion of the lines has had an impact we
>>>> didn't
>>>> expect. We have vacuumed and analysed the database.
>>>>
>>>> The schema for the table we insert into is
>>>>
>>>> CREATE TABLE IF NOT EXISTS "Disruptions" (
>>>> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>>>> "version" integer NOT NULL,
>>>> "Disruption_id" INTEGER NOT NULL,
>>>> "status" integer NOT NULL,
>>>> "severity" integer NOT NULL,
>>>> "levelOfInterest" integer NOT NULL,
>>>> "category" integer NOT NULL,
>>>> "subCategory" integer NOT NULL,
>>>> "startTime" TEXT NOT NULL,
>>>> "endTime" text NOT NULL,
>>>> "location" integer NOT NULL,
>>>> "corridor" integer NOT NULL,
>>>> "comments" integer NOT NULL,
>>>> "currentUpdate" integer NOT NULL,
>>>> "remarkTime" TEXT NOT NULL,
>>>> "lastModTime" TEXT NOT NULL,
>>>> "CauseAreaPointX" real NOT NULL,
>>>> "CauseAreaPointY" real NOT NULL,
>>>> "Direction" TEXT
>>>> );
>>>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
>>>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
>>>> ASC,
>>>> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
>>>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
>>>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
>>>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
>>>> ("Disruption_id"
>>>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category"
>>>> COLLATE
>>>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
>>>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>>>> COLLATE
>>>> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
>>>> NOCASE ASC);
>>>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
>>>> "Disruption_id",
>>>> "Severity", "levelOfInterest", "category", "subCategory",
>>>> "version");
>>>>
>>>> We have checked that this schema is consistent across the
>>>> databases.
>>>>
>>>> We're about to recreate the table to see if that makes a
>>>> difference.
>>>>
>>>> Any help or advice welcomed.
>>>>
>>>> Thanks
>>>>
>>>> Rob
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
In reply to this post by Warren Young
Warren,

On 30 Jul 2018, at 12:28, Warren Young wrote:

> On Jul 30, 2018, at 4:51 AM, Rob Willett
> <[hidden email]> wrote:
>>
>> The system has been running for 2-3 years
>
> Has performance changed over that span?  Which direction?

Performance hasn't changed on the large 60GB data database. its pretty
consistent.

>
>> we have taken the opportunity to try and prune the database from 60GB
>> down to 600MB.
>
> SQLite’s speed is only weakly affected by database size.  A starting
> guess is log2(N) where N is file size.
>
> Since your data access didn’t get 10x faster from your 100x size
> drop, you can fairly guess that the speed problem isn’t due to the
> inherent time required to traverse tree-based data structures in
> SQLite.
>

The reason for the pruning is not for the benefit of SQLite, it's for
our own administration. Backing up 60GB files is hard work, we're also
struggling to fire up replicas, so we wanted to look at the database as
a whole and get the size down. The 99% reduction is great, we were
hoping for a 50% to 80% reduction.

>> The system has a restriction of IOPS as it's a Virtual Private
>> Server. Technically it's running Ubuntu 16.04 under OpenVZ. We can
>> get 69MB/sec with a disk to disk copy, which isn't brilliant if we
>> had sustained disk traffic which we don’t.
>
> I am sensing a spinning disk.  (Else, throughput should be a lot
> higher.)

I suspect that is the case. However 69MB/sec is adequate for the 60GB
database and for normal usage. Our commits are fine on the 60GB
database, just not the 600MB one.

>
> SQLite takes data durability (the D in ACID) seriously, unlike most
> other software applications, so it is uncommon in that it flushes each
> transaction to disk before proceeding with further writes to that
> table.
>
> A flush to disk takes a full disk rotation, and due to the way
> SQLite’s journal operates, each transaction requires two flushes.  
> That means that with a 7200 RPM disk, you can get a maximum of 60
> transactions per second per table with SQLite.

I agree with your logic, but I would wonder why writing the data to a
60GB database and doing a commit is fast and writing exactly the same
data to the 600MB database is different. The programs for doing it are
the same, the database schema is identical.

>
> Sound familiar?
>
> If I’ve guessed the problem correctly, the solutions are:
>
> 1. Batch multiple writes in a transaction.

All ready done.

>
> 2. Switch to an SSD.

Not an option in the short term for production BUT the test system we
have setup has an SSD. This may skew the results though. Its difficult
to get an identical system setup, but the other option is a VMWare
Fusion system on a Mac with a spinning disk. Most of our local systems
are SSD, we have a Mac with a spinning disk for backup.

>
> 3. Use multiple tables and/or multiple DB files.  In your case, I’d
> suggest one SQLite DB per sensor, with one thread per sensor, each of
> which keeps one of the SQLite DBs open continuously.  That way, a
> blocked DB conn won’t block any other writers.

We have one process that reads a single file in every 5 mins. No need
for multiple databases or multiple threads.

>
> Those solutions are given in order of ease of application and cost of
> implementation.
>
>> Nothing in the log files gives us any concern
>
> Have you tried SQLite’s new .expert feature?

No, but we will now :)
>
>    https://sqlite.org/cli.html#index_recommendations_sqlite_expert_
>
>> dropping a very large table is really, really, really slow.
>
> If you put your main data table in a file of its own, you can quickly
> “drop” the table by just closing the DB and removing the DB file
> from disk.

Thats what we will do on our test system.

>
> When you then recreate the DB file with a fresh schema, it’s
> effectively defragged/vacuumed as well.

We have a copy of the 60GB data file (took a ling time to download)
locally now. We will use this (or rather a copy) to start the testing.

Thanks very much for the thoughtful and useful comments.

Rob

> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12