Very, very slow commits

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Warren Young
On Jul 30, 2018, at 5:53 AM, Rob Willett <[hidden email]> wrote:
>
> I would wonder why writing the data to a 60GB database and doing a commit is fast and writing exactly the same data to the 600MB database is different. The programs for doing it are the same, the database schema is identical.

I assume the hardware is different.  Is that not the case?

If the small DB is on a machine with a spinning disk but the large DB is on a machine with either an SSD or a many-spindled RAID, there’s your key difference.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
Warren,

The hardware is different, at the time we didn't want to spin up a
complete production replica as thats quite expensive. We used a smaller
machine, both have the same type of back end spining raid array, but we
would think that writing 4-5MB of changed data back shouldn't take 7
seconds. We had seen far better performance on the slower machine
earlier in testing.

We will go back to step one and work our way through step by step from
60GB to 600Mb as our thinking is that we have somehow screwed our
database up.

Rob


On 30 Jul 2018, at 13:29, Warren Young wrote:

> On Jul 30, 2018, at 5:53 AM, Rob Willett
> <[hidden email]> wrote:
>>
>> I would wonder why writing the data to a 60GB database and doing a
>> commit is fast and writing exactly the same data to the 600MB
>> database is different. The programs for doing it are the same, the
>> database schema is identical.
>
> I assume the hardware is different.  Is that not the case?
>
> If the small DB is on a machine with a spinning disk but the large DB
> is on a machine with either an SSD or a many-spindled RAID, there’s
> your key difference.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

droedel
In reply to this post by Rob Willett
Hi Rob,
Answers are in the text below

On Mon, Jul 30, 2018, at 13:45, Rob Willett wrote:
> Droedel,
>
> We don't think there are significant read access. The database is a
> single database on a single thread on a single process. The only access
> to it is a Perl script that logs the incoming information. We never have
> two accesses at the same time.

Can you also _measure_ read access, preferably on system level on your production database, e.g. by using iostat ? I've seen cases where (other, non-SQLite) databases had unexpected disk access patterns due to an application error.

> We have a nagging feeling (and thats all it is) about the autoincrement
> value. We do use that feature in the table, but we have deleted so many

[snip]
If both databases use autoincrement, then performance should be similar (or at least that's what I expect). Can you easily check if the newly generated IDs are as expected and larger than any existing ID in your table ?

> We did wonder if we are filling up pages in the middle or something.
> However we expected the vacuum and analyse to sort this out. Now its

[snip]
sqlite3_analyzer can give some measurements, e.g. unused bytes on index pages.

> We've built the replica test system now and we're going to have some
> initial checks and get some benchmarks in place.
>
> It could be an interesting and exciting ride :)

Sure. It's always fun learning something new. But it's less fun in full production when customers are yelling :-(

Regards,

Droedel


>
> Rob
>
>
> On 30 Jul 2018, at 12:32, Droedel wrote:
>
> > Hi Rob,
> >
> > Is there significant read access (iostat: r/s) during these slow
> > writes ? If yes, it might be due to a small cache, requiring the
> > database to read (index) pages before updating them.
> >
> > And is the data you're adding in both databases (large/small) added at
> > the end of the table using the autoincrement, or do you insert some
> > items in the middle ? I'm not a SQLite performance expert, but in
> > other databases performance can be very different because in the
> > former case fewer pages must be updated.
> >
> > Microsoft SQL Server has something called "fill factor", basically the
> > max percentage of an index page that is used during initial fill,
> > which helps avoiding too many page shuffling in the index when extra
> > items are added. Disadvantage: it makes DBAs argue endlessly about the
> > best fill factor ;-) Maybe there's something similar possible in
> > SQLite but I couldn't find a pragma for this.
> >
> > Oh, and do both databases have the same page size, preferably 4K ?
> >
> > Regards,
> >
> > Droedel
> >
> >
> >
> > On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:
> >> Droedel,
> >>
> >> Thanks for the comprehensive reply. We have actually done all of
> >> this.
> >>
> >> The system has been running for 2-3 years and we have taken the
> >> opportunity to try and prune the database from 60GB down to 600MB.
> >> Currently the live system is working OK with a 60GB database, but our
> >> test system (which is smaller) is struggling with 600MB.
> >>
> >> The system has a restriction of IOPS as it's a Virtual Private
> >> Server.
> >> Technically it's running Ubuntu 16.04 under OpenVZ. We can get
> >> 69MB/sec
> >> with a disk to disk copy, which isn't brilliant if we had sustained
> >> disk
> >> traffic which we don't.
> >>
> >> We log approx 600 - 800 items of around 3-5K every five minutes.
> >> These
> >> 600-800 items are mainly an insert into a single table, there are
> >> other
> >> things happening as well, but this is the bulk of the work. We can
> >> see
> >> that the -wal files grow a small amount (4-5MB) just before the
> >> commit.
> >> It then takes 7 seconds to execute the commit. This is the bit that
> >> we're struggling with. We know we can get circa 70MB/sec data
> >> throughput, so this should take a fraction of a second. Now SQLite
> >> needs
> >> to work out which pages to commit so thats a little longer, but we
> >> know
> >> SQLite is fast, so that shouldn't take 7 seconds on the small
> >> database
> >> as it doesn't take that long on the large 60GB database. Thats the
> >> puzzling bit, the large database is quick, the small one slow.
> >>
> >> We have no logging turned on, we can turn SQL logging on at the DBI
> >> level but that turns a 20 sec run into a 2-3 minute run as it
> >> captures
> >> everything :) Nothing in the log files gives us any concern (apart
> >> from
> >> the really long commit time). Simon Slavin suggested dropping the
> >> indexes which we did, that turned the commit into a fast commit, so
> >> its
> >> something to do with the indexes but we can't see what.
> >>
> >> What we are now doing is going back to the very beginning:
> >>
> >> 1. We built a replacement system yesterday with 8GB memory and 8
> >> cores
> >> and 150GB disk space. Its virtualised (ESXI) but under our control.
> >> 2. We've installed a copy of the old 60GB database on the new system.
> >> 3. We're going to benchmark the new system over a couple of thousand
> >> runs to see what the average time is.
> >> 4. We'll then work our way through the deduping of the database step
> >> by
> >> step to see when the commit time blow up. This will take a few days
> >> as
> >> working out the duplications of 200,000,000 rows isn't that quick :)
> >> As
> >> we found out, dropping a very large table is really, really, really
> >> slow.
> >> 5. We'll apply some of the ideas that people have suggested since
> >> yesterday to see if they work, but I'm keen that we have a repeatable
> >> problem that we solve rather than we use a scatter gun approach to
> >> fixing it. We think SQLite is well written so we figure the problem
> >> is
> >> ours to solve rather than simply blaming the software.
> >>
> >>
> >> Thanks
> >>
> >> Rob
> >>
> >> On 30 Jul 2018, at 11:11, Droedel wrote:
> >>
> >>> Hi,
> >>>
> >>> When having bad performance, I usually first try to find out if the
> >>> slowness is due to disk througput (sequential), slow random access
> >>> or
> >>> something else. In Linux, try "iostat -xtc 5". Do this with and
> >>> without your application writing to disk.
> >>>
> >>> If you see high CPU %iowait and high %util on your disk, then disk
> >>> is
> >>> the bottleneck. If not: start profiling / analyzing other
> >>> bottlenecks
> >>> (CPU / network / ...)
> >>>
> >>> If the disk throughput (wMB/s) is close to your normal sequential
> >>> throughput (69 MB/s): try to write less data or get a faster disk.
> >>> If the disk troughput is low, but high numbers of writes (w/s):
> >>> there's too much seeking / too many small writes to your disk. Page
> >>> cache too small ? Checkpointing too often ?
> >>>
> >>> Sometimes this kind of problems is caused by other applications
> >>> (logging / ...) causing too much baseload. %util should be low when
> >>> your application isn't running.
> >>>
> >>> Just my 2 cents.
> >>>
> >>> Kind regards,
> >>>
> >>> Droedel
> >>>
> >>>
> >>> On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:
> >>>> Hi,
> >>>>
> >>>> Background
> >>>>
> >>>> We've been refactoring our database to reduce the size of it.
> >>>> Through
> >>>> some simple logic we've managed to pull out 99% of the data to
> >>>> reduce
> >>>> the size from 51GB down to approx 600MB. This logic has been to
> >>>> remove
> >>>> rows that are almost the same but not quite identical. As with all
> >>>> things, the thinking was the difficult bit, the execution somewhat
> >>>> easier.
> >>>>
> >>>> As part of the testing we've been doing, we've now hit on an odd
> >>>> and
> >>>> weird problem to do with the COMMIT statement. A commit of a few
> >>>> hundred
> >>>> (circa 600-800) rows takes approx 7 seconds whereas before we never
> >>>> even
> >>>> noticed it, though we now know it was two seconds before. Each row
> >>>> is
> >>>> probably 1-2K of data, so its not very much at all.
> >>>>
> >>>> Details of what we have tried:
> >>>>
> >>>> 1. We've turned synchronous  on and off
> >>>>
> >>>> PRAGMA synchronous=ON
> >>>>
> >>>> and thats not made any difference.
> >>>>
> >>>> 2. We are using and have been using WAL mode for years.
> >>>>
> >>>> PRAGMA journal_mode;
> >>>> journal_mode
> >>>> wal
> >>>>
> >>>> 3. We've tested that the server copies OK, we get a consistent
> >>>> 69MB/sec.
> >>>> This is not as fast we would like, but it's the same across all our
> >>>> virtual servers.
> >>>>
> >>>> 4. We've tested the commit on our existing 60GB database and it
> >>>> takes
> >>>> 2
> >>>> seconds, which is longer than we thought it would be. The server
> >>>> for
> >>>> the
> >>>> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
> >>>> 14.04.
> >>>> The
> >>>> server we are testing on is a 2GB/2 core test server running Ubuntu
> >>>> 16.04. Whilst the test server is smaller, we wouldn't expect it to
> >>>> take
> >>>> 3 times longer to do a commit.
> >>>>
> >>>> 5. The code is identical across the servers. We are running Perl
> >>>> and
> >>>> the
> >>>> DBI module. The code for doing a commit in Perl::DBI is
> >>>>   $dbh->do("COMMIT");
> >>>>
> >>>>    We are getting the expected performance elsewhere on the system
> >>>> and
> >>>> in the code. It's just the commit that is taking a long time.
> >>>>
> >>>> 6. The code we are committing is adding 600-800 lines to a table
> >>>> that
> >>>> used to be 200,000,000 rows in size. It's now 400,000 lines in
> >>>> size.
> >>>> We
> >>>> are wondering if the deletion of the lines has had an impact we
> >>>> didn't
> >>>> expect. We have vacuumed and analysed the database.
> >>>>
> >>>> The schema for the table we insert into is
> >>>>
> >>>> CREATE TABLE IF NOT EXISTS "Disruptions" (
> >>>> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> >>>> "version" integer NOT NULL,
> >>>> "Disruption_id" INTEGER NOT NULL,
> >>>> "status" integer NOT NULL,
> >>>> "severity" integer NOT NULL,
> >>>> "levelOfInterest" integer NOT NULL,
> >>>> "category" integer NOT NULL,
> >>>> "subCategory" integer NOT NULL,
> >>>> "startTime" TEXT NOT NULL,
> >>>> "endTime" text NOT NULL,
> >>>> "location" integer NOT NULL,
> >>>> "corridor" integer NOT NULL,
> >>>> "comments" integer NOT NULL,
> >>>> "currentUpdate" integer NOT NULL,
> >>>> "remarkTime" TEXT NOT NULL,
> >>>> "lastModTime" TEXT NOT NULL,
> >>>> "CauseAreaPointX" real NOT NULL,
> >>>> "CauseAreaPointY" real NOT NULL,
> >>>> "Direction" TEXT
> >>>> );
> >>>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> >>>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
> >>>> ASC,
> >>>> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> >>>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> >>>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> >>>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
> >>>> ("Disruption_id"
> >>>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category"
> >>>> COLLATE
> >>>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> >>>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> >>>> COLLATE
> >>>> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
> >>>> NOCASE ASC);
> >>>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
> >>>> "Disruption_id",
> >>>> "Severity", "levelOfInterest", "category", "subCategory",
> >>>> "version");
> >>>>
> >>>> We have checked that this schema is consistent across the
> >>>> databases.
> >>>>
> >>>> We're about to recreate the table to see if that makes a
> >>>> difference.
> >>>>
> >>>> Any help or advice welcomed.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Rob
> >>>> _______________________________________________
> >>>> sqlite-users mailing list
> >>>> [hidden email]
> >>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> [hidden email]
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
Droedel,

Fortunately we have no performance issues in production with the 60GB
database. These issues came out in testing (which is what testing is
for).

We're investigating the newly generated ID's as we speak or we will be
once we get our replica production system setup.

sqlite_analyser has been a problem for us. We've struggled to get a
build for it.

Rob

On 30 Jul 2018, at 13:49, Droedel wrote:

> Hi Rob,
> Answers are in the text below
>
> On Mon, Jul 30, 2018, at 13:45, Rob Willett wrote:
>> Droedel,
>>
>> We don't think there are significant read access. The database is a
>> single database on a single thread on a single process. The only
>> access
>> to it is a Perl script that logs the incoming information. We never
>> have
>> two accesses at the same time.
>
> Can you also _measure_ read access, preferably on system level on your
> production database, e.g. by using iostat ? I've seen cases where
> (other, non-SQLite) databases had unexpected disk access patterns due
> to an application error.
>
>> We have a nagging feeling (and thats all it is) about the
>> autoincrement
>> value. We do use that feature in the table, but we have deleted so
>> many
>
> [snip]
> If both databases use autoincrement, then performance should be
> similar (or at least that's what I expect). Can you easily check if
> the newly generated IDs are as expected and larger than any existing
> ID in your table ?
>
>> We did wonder if we are filling up pages in the middle or something.
>> However we expected the vacuum and analyse to sort this out. Now its
>
> [snip]
> sqlite3_analyzer can give some measurements, e.g. unused bytes on
> index pages.
>
>> We've built the replica test system now and we're going to have some
>> initial checks and get some benchmarks in place.
>>
>> It could be an interesting and exciting ride :)
>
> Sure. It's always fun learning something new. But it's less fun in
> full production when customers are yelling :-(
>
> Regards,
>
> Droedel
>
>
>>
>> Rob
>>
>>
>> On 30 Jul 2018, at 12:32, Droedel wrote:
>>
>>> Hi Rob,
>>>
>>> Is there significant read access (iostat: r/s) during these slow
>>> writes ? If yes, it might be due to a small cache, requiring the
>>> database to read (index) pages before updating them.
>>>
>>> And is the data you're adding in both databases (large/small) added
>>> at
>>> the end of the table using the autoincrement, or do you insert some
>>> items in the middle ? I'm not a SQLite performance expert, but in
>>> other databases performance can be very different because in the
>>> former case fewer pages must be updated.
>>>
>>> Microsoft SQL Server has something called "fill factor", basically
>>> the
>>> max percentage of an index page that is used during initial fill,
>>> which helps avoiding too many page shuffling in the index when extra
>>> items are added. Disadvantage: it makes DBAs argue endlessly about
>>> the
>>> best fill factor ;-) Maybe there's something similar possible in
>>> SQLite but I couldn't find a pragma for this.
>>>
>>> Oh, and do both databases have the same page size, preferably 4K ?
>>>
>>> Regards,
>>>
>>> Droedel
>>>
>>>
>>>
>>> On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:
>>>> Droedel,
>>>>
>>>> Thanks for the comprehensive reply. We have actually done all of
>>>> this.
>>>>
>>>> The system has been running for 2-3 years and we have taken the
>>>> opportunity to try and prune the database from 60GB down to 600MB.
>>>> Currently the live system is working OK with a 60GB database, but
>>>> our
>>>> test system (which is smaller) is struggling with 600MB.
>>>>
>>>> The system has a restriction of IOPS as it's a Virtual Private
>>>> Server.
>>>> Technically it's running Ubuntu 16.04 under OpenVZ. We can get
>>>> 69MB/sec
>>>> with a disk to disk copy, which isn't brilliant if we had sustained
>>>> disk
>>>> traffic which we don't.
>>>>
>>>> We log approx 600 - 800 items of around 3-5K every five minutes.
>>>> These
>>>> 600-800 items are mainly an insert into a single table, there are
>>>> other
>>>> things happening as well, but this is the bulk of the work. We can
>>>> see
>>>> that the -wal files grow a small amount (4-5MB) just before the
>>>> commit.
>>>> It then takes 7 seconds to execute the commit. This is the bit that
>>>> we're struggling with. We know we can get circa 70MB/sec data
>>>> throughput, so this should take a fraction of a second. Now SQLite
>>>> needs
>>>> to work out which pages to commit so thats a little longer, but we
>>>> know
>>>> SQLite is fast, so that shouldn't take 7 seconds on the small
>>>> database
>>>> as it doesn't take that long on the large 60GB database. Thats the
>>>> puzzling bit, the large database is quick, the small one slow.
>>>>
>>>> We have no logging turned on, we can turn SQL logging on at the DBI
>>>> level but that turns a 20 sec run into a 2-3 minute run as it
>>>> captures
>>>> everything :) Nothing in the log files gives us any concern (apart
>>>> from
>>>> the really long commit time). Simon Slavin suggested dropping the
>>>> indexes which we did, that turned the commit into a fast commit, so
>>>> its
>>>> something to do with the indexes but we can't see what.
>>>>
>>>> What we are now doing is going back to the very beginning:
>>>>
>>>> 1. We built a replacement system yesterday with 8GB memory and 8
>>>> cores
>>>> and 150GB disk space. Its virtualised (ESXI) but under our control.
>>>> 2. We've installed a copy of the old 60GB database on the new
>>>> system.
>>>> 3. We're going to benchmark the new system over a couple of
>>>> thousand
>>>> runs to see what the average time is.
>>>> 4. We'll then work our way through the deduping of the database
>>>> step
>>>> by
>>>> step to see when the commit time blow up. This will take a few days
>>>> as
>>>> working out the duplications of 200,000,000 rows isn't that quick
>>>> :)
>>>> As
>>>> we found out, dropping a very large table is really, really, really
>>>> slow.
>>>> 5. We'll apply some of the ideas that people have suggested since
>>>> yesterday to see if they work, but I'm keen that we have a
>>>> repeatable
>>>> problem that we solve rather than we use a scatter gun approach to
>>>> fixing it. We think SQLite is well written so we figure the problem
>>>> is
>>>> ours to solve rather than simply blaming the software.
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Rob
>>>>
>>>> On 30 Jul 2018, at 11:11, Droedel wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> When having bad performance, I usually first try to find out if
>>>>> the
>>>>> slowness is due to disk througput (sequential), slow random access
>>>>> or
>>>>> something else. In Linux, try "iostat -xtc 5". Do this with and
>>>>> without your application writing to disk.
>>>>>
>>>>> If you see high CPU %iowait and high %util on your disk, then disk
>>>>> is
>>>>> the bottleneck. If not: start profiling / analyzing other
>>>>> bottlenecks
>>>>> (CPU / network / ...)
>>>>>
>>>>> If the disk throughput (wMB/s) is close to your normal sequential
>>>>> throughput (69 MB/s): try to write less data or get a faster disk.
>>>>> If the disk troughput is low, but high numbers of writes (w/s):
>>>>> there's too much seeking / too many small writes to your disk.
>>>>> Page
>>>>> cache too small ? Checkpointing too often ?
>>>>>
>>>>> Sometimes this kind of problems is caused by other applications
>>>>> (logging / ...) causing too much baseload. %util should be low
>>>>> when
>>>>> your application isn't running.
>>>>>
>>>>> Just my 2 cents.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Droedel
>>>>>
>>>>>
>>>>> On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Background
>>>>>>
>>>>>> We've been refactoring our database to reduce the size of it.
>>>>>> Through
>>>>>> some simple logic we've managed to pull out 99% of the data to
>>>>>> reduce
>>>>>> the size from 51GB down to approx 600MB. This logic has been to
>>>>>> remove
>>>>>> rows that are almost the same but not quite identical. As with
>>>>>> all
>>>>>> things, the thinking was the difficult bit, the execution
>>>>>> somewhat
>>>>>> easier.
>>>>>>
>>>>>> As part of the testing we've been doing, we've now hit on an odd
>>>>>> and
>>>>>> weird problem to do with the COMMIT statement. A commit of a few
>>>>>> hundred
>>>>>> (circa 600-800) rows takes approx 7 seconds whereas before we
>>>>>> never
>>>>>> even
>>>>>> noticed it, though we now know it was two seconds before. Each
>>>>>> row
>>>>>> is
>>>>>> probably 1-2K of data, so its not very much at all.
>>>>>>
>>>>>> Details of what we have tried:
>>>>>>
>>>>>> 1. We've turned synchronous  on and off
>>>>>>
>>>>>> PRAGMA synchronous=ON
>>>>>>
>>>>>> and thats not made any difference.
>>>>>>
>>>>>> 2. We are using and have been using WAL mode for years.
>>>>>>
>>>>>> PRAGMA journal_mode;
>>>>>> journal_mode
>>>>>> wal
>>>>>>
>>>>>> 3. We've tested that the server copies OK, we get a consistent
>>>>>> 69MB/sec.
>>>>>> This is not as fast we would like, but it's the same across all
>>>>>> our
>>>>>> virtual servers.
>>>>>>
>>>>>> 4. We've tested the commit on our existing 60GB database and it
>>>>>> takes
>>>>>> 2
>>>>>> seconds, which is longer than we thought it would be. The server
>>>>>> for
>>>>>> the
>>>>>> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
>>>>>> 14.04.
>>>>>> The
>>>>>> server we are testing on is a 2GB/2 core test server running
>>>>>> Ubuntu
>>>>>> 16.04. Whilst the test server is smaller, we wouldn't expect it
>>>>>> to
>>>>>> take
>>>>>> 3 times longer to do a commit.
>>>>>>
>>>>>> 5. The code is identical across the servers. We are running Perl
>>>>>> and
>>>>>> the
>>>>>> DBI module. The code for doing a commit in Perl::DBI is
>>>>>>   $dbh->do("COMMIT");
>>>>>>
>>>>>>    We are getting the expected performance elsewhere on the
>>>>>> system
>>>>>> and
>>>>>> in the code. It's just the commit that is taking a long time.
>>>>>>
>>>>>> 6. The code we are committing is adding 600-800 lines to a table
>>>>>> that
>>>>>> used to be 200,000,000 rows in size. It's now 400,000 lines in
>>>>>> size.
>>>>>> We
>>>>>> are wondering if the deletion of the lines has had an impact we
>>>>>> didn't
>>>>>> expect. We have vacuumed and analysed the database.
>>>>>>
>>>>>> The schema for the table we insert into is
>>>>>>
>>>>>> CREATE TABLE IF NOT EXISTS "Disruptions" (
>>>>>> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>>>>>> "version" integer NOT NULL,
>>>>>> "Disruption_id" INTEGER NOT NULL,
>>>>>> "status" integer NOT NULL,
>>>>>> "severity" integer NOT NULL,
>>>>>> "levelOfInterest" integer NOT NULL,
>>>>>> "category" integer NOT NULL,
>>>>>> "subCategory" integer NOT NULL,
>>>>>> "startTime" TEXT NOT NULL,
>>>>>> "endTime" text NOT NULL,
>>>>>> "location" integer NOT NULL,
>>>>>> "corridor" integer NOT NULL,
>>>>>> "comments" integer NOT NULL,
>>>>>> "currentUpdate" integer NOT NULL,
>>>>>> "remarkTime" TEXT NOT NULL,
>>>>>> "lastModTime" TEXT NOT NULL,
>>>>>> "CauseAreaPointX" real NOT NULL,
>>>>>> "CauseAreaPointY" real NOT NULL,
>>>>>> "Direction" TEXT
>>>>>> );
>>>>>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location"
>>>>>> COLLATE
>>>>>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE
>>>>>> NOCASE
>>>>>> ASC,
>>>>>> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
>>>>>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE
>>>>>> ASC,
>>>>>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE
>>>>>> ASC);
>>>>>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
>>>>>> ("Disruption_id"
>>>>>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category"
>>>>>> COLLATE
>>>>>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
>>>>>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>>>>>> COLLATE
>>>>>> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>>>>> COLLATE
>>>>>> NOCASE ASC);
>>>>>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
>>>>>> "Disruption_id",
>>>>>> "Severity", "levelOfInterest", "category", "subCategory",
>>>>>> "version");
>>>>>>
>>>>>> We have checked that this schema is consistent across the
>>>>>> databases.
>>>>>>
>>>>>> We're about to recreate the table to see if that makes a
>>>>>> difference.
>>>>>>
>>>>>> Any help or advice welcomed.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Rob
>>>>>> _______________________________________________
>>>>>> sqlite-users mailing list
>>>>>> [hidden email]
>>>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> [hidden email]
>>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Problem with sqlite3_prepare_v2

Richard Hipp-3
In reply to this post by Jürgen Palm-2
On 7/29/18, Jürgen Palm <[hidden email]> wrote:
> As a quick, ugly workaround for my situation I could add something like
>    sqlite3_exec(conn1, "SELECT 1",NULL,NULL,NULL);
> before the sqlite3_prepare_v2 after the schema change.
>
> Is there any better way? Or would it be possible to add a SQLITE_PREPARE
> flag to sqlite3_prepare_v3, which forces a schema change check?

You only need to do this for sqlite3_column_count().  The
sqlite3_prepare_v3() interface might use an obsolete version of the
schema, but that will be detected and the prepare will automatically
rerun itself at the first sqlite3_step() on the prepared statement.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
In reply to this post by Simon Slavin-3
Simon,

Apologies for taking so long to get back, we've been building a test
system and its taken a long time.

We're just getting round to trying your ideas out to see what difference
they make,

We've created a new table based on your ideas, moved the collate into
the table, analysed the database. We did **not** add COLLATE NOCASE to
the columns which are defined as integers. Would that make a difference?

We've found it now takes around 10% longer to do the queries than
before.

Rob


> Please try moving your COLLATE clauses into the table definition.  
> e.g. instead of
>
>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
> COLLATE NOCASE ASC);
>
> Your table definition should have
>
> "version" integer NOT NULL COLLATE NOCASE,
> "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
> ...
> "location" integer NOT NULL COLLATE NOCASE,
>
> and the index should be
>
>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>
> Once data has been entered, do ANALYZE.  This step may take a long
> time.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
Simon,

As an exercise we have just added in COLLATE NOCASE to our integer
columns.

Whoops! We thought this would make no difference but its added extra 70%
to our processing speeds.

We've now got to the stage where we can make changes quickly, so we'll
back that change out and go back to the integer defn without COLLATE
NOCASE.

Rob

On 31 Jul 2018, at 14:59, Rob Willett wrote:

> Simon,
>
> Apologies for taking so long to get back, we've been building a test
> system and its taken a long time.
>
> We're just getting round to trying your ideas out to see what
> difference they make,
>
> We've created a new table based on your ideas, moved the collate into
> the table, analysed the database. We did **not** add COLLATE NOCASE to
> the columns which are defined as integers. Would that make a
> difference?
>
> We've found it now takes around 10% longer to do the queries than
> before.
>
> Rob
>
>
>> Please try moving your COLLATE clauses into the table definition.  
>> e.g. instead of
>>
>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>> COLLATE NOCASE ASC);
>>
>> Your table definition should have
>>
>> "version" integer NOT NULL COLLATE NOCASE,
>> "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
>> ...
>> "location" integer NOT NULL COLLATE NOCASE,
>>
>> and the index should be
>>
>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>>
>> Once data has been entered, do ANALYZE.  This step may take a long
>> time.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits - Possibly solved

Rob Willett
Dear all,

We think we have now found the issue with the slow commits.

We believe this is due to an inherent (and old) defect in our database
design. We think our original design has an implicit ordering of rows in
a table, when the table is only increasing this flaw in the design isn't
apparent.

However when we started deduping the table AND we copied rows from one
table to another to move things around, we changed the underlying order
of rows. Sqlite handles the design change BUT the flaw in our design
becomes apparent as we keep moving the data around and data gets mixed
up. The database slows down when we create a second table with an
identical structure to the first table, copy the data into the new
table, drop the old and then when we rename the old table to the new
table, things appear to slow down. Logically speaking SQLite shouldn't
notice the difference in row order, but things do slow down, even with
analyse.

We think that a better index definition could solve the problem for us,
a better database design would, but thats a tricky problem.

We're now going back to our 60GB database and start from scratch to see
if we can create the issue (now we think we know what it is).

Thanks to everybody who contributed ideas, we appreciate the help.

Rob

On 31 Jul 2018, at 15:19, Rob Willett wrote:

> Simon,
>
> As an exercise we have just added in COLLATE NOCASE to our integer
> columns.
>
> Whoops! We thought this would make no difference but its added extra
> 70% to our processing speeds.
>
> We've now got to the stage where we can make changes quickly, so we'll
> back that change out and go back to the integer defn without COLLATE
> NOCASE.
>
> Rob
>
> On 31 Jul 2018, at 14:59, Rob Willett wrote:
>
>> Simon,
>>
>> Apologies for taking so long to get back, we've been building a test
>> system and its taken a long time.
>>
>> We're just getting round to trying your ideas out to see what
>> difference they make,
>>
>> We've created a new table based on your ideas, moved the collate into
>> the table, analysed the database. We did **not** add COLLATE NOCASE
>> to the columns which are defined as integers. Would that make a
>> difference?
>>
>> We've found it now takes around 10% longer to do the queries than
>> before.
>>
>> Rob
>>
>>
>>> Please try moving your COLLATE clauses into the table definition.  
>>> e.g. instead of
>>>
>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>> COLLATE NOCASE ASC);
>>>
>>> Your table definition should have
>>>
>>> "version" integer NOT NULL COLLATE NOCASE,
>>> "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
>>> ...
>>> "location" integer NOT NULL COLLATE NOCASE,
>>>
>>> and the index should be
>>>
>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>>>
>>> Once data has been entered, do ANALYZE.  This step may take a long
>>> time.
>>>
>>> Simon.
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits - Possibly solved

Chris Locke-3
I've been following this thread with interest, but this just doesn't make
sense...

>  Logically speaking SQLite shouldn't notice the difference in row order,
but things do slow down,
> even with analyse.

Are you accessing each row via its ID?  Even so, that should still be
indexed.
I thought you were simply adding records into the database - I'm failing to
grasp how this is slowing down in the new database.


Thanks,
Chris



On Tue, Jul 31, 2018 at 3:30 PM Rob Willett <[hidden email]>
wrote:

> Dear all,
>
> We think we have now found the issue with the slow commits.
>
> We believe this is due to an inherent (and old) defect in our database
> design. We think our original design has an implicit ordering of rows in
> a table, when the table is only increasing this flaw in the design isn't
> apparent.
>
> However when we started deduping the table AND we copied rows from one
> table to another to move things around, we changed the underlying order
> of rows. Sqlite handles the design change BUT the flaw in our design
> becomes apparent as we keep moving the data around and data gets mixed
> up. The database slows down when we create a second table with an
> identical structure to the first table, copy the data into the new
> table, drop the old and then when we rename the old table to the new
> table, things appear to slow down. Logically speaking SQLite shouldn't
> notice the difference in row order, but things do slow down, even with
> analyse.
>
> We think that a better index definition could solve the problem for us,
> a better database design would, but thats a tricky problem.
>
> We're now going back to our 60GB database and start from scratch to see
> if we can create the issue (now we think we know what it is).
>
> Thanks to everybody who contributed ideas, we appreciate the help.
>
> Rob
>
> On 31 Jul 2018, at 15:19, Rob Willett wrote:
>
> > Simon,
> >
> > As an exercise we have just added in COLLATE NOCASE to our integer
> > columns.
> >
> > Whoops! We thought this would make no difference but its added extra
> > 70% to our processing speeds.
> >
> > We've now got to the stage where we can make changes quickly, so we'll
> > back that change out and go back to the integer defn without COLLATE
> > NOCASE.
> >
> > Rob
> >
> > On 31 Jul 2018, at 14:59, Rob Willett wrote:
> >
> >> Simon,
> >>
> >> Apologies for taking so long to get back, we've been building a test
> >> system and its taken a long time.
> >>
> >> We're just getting round to trying your ideas out to see what
> >> difference they make,
> >>
> >> We've created a new table based on your ideas, moved the collate into
> >> the table, analysed the database. We did **not** add COLLATE NOCASE
> >> to the columns which are defined as integers. Would that make a
> >> difference?
> >>
> >> We've found it now takes around 10% longer to do the queries than
> >> before.
> >>
> >> Rob
> >>
> >>
> >>> Please try moving your COLLATE clauses into the table definition.
> >>> e.g. instead of
> >>>
> >>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> >>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
> >>> COLLATE NOCASE ASC);
> >>>
> >>> Your table definition should have
> >>>
> >>>      "version" integer NOT NULL COLLATE NOCASE,
> >>>      "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
> >>> ...
> >>>      "location" integer NOT NULL COLLATE NOCASE,
> >>>
> >>> and the index should be
> >>>
> >>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
> >>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
> >>>
> >>> Once data has been entered, do ANALYZE.  This step may take a long
> >>> time.
> >>>
> >>> Simon.
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> [hidden email]
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Simon Slavin-3
In reply to this post by Rob Willett
On 31 Jul 2018, at 2:59pm, Rob Willett <[hidden email]> wrote:

> We've created a new table based on your ideas, moved the collate into the table, analysed the database. We did **not** add COLLATE NOCASE to the columns which are defined as integers. Would that make a difference?

What you did is correct.  I gave wrong advice for which I apologise.  But I am now confused since your original code is a little strange.  Your original has a table definition including

        "version" integer NOT NULL,

but then

        CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE NOCASE ASC, "subCategory" COLLATE NOCASE ASC);

If "version" really is an INTEGER, then it is incorrect to use version COLLATE NOCASE in the index.  NOCASE is purely for text values.  This may be slowing things down.

To solve it, in the table definition, use COLLATE NOCASE for TEXT columns and not for INTEGER columns.  Also, remove all mentions of COLLATE NOCASE in your index definitions.  Collation methods should be set in the table definition, not in indexes, except for some unusual situations.

This should increase your speed relative to your original timings.  If it slows things down, something else I haven't spotted is wrong.

> We've found it now takes around 10% longer to do the queries than before.

That is understandable given the incorrect advice I gave you before.

In another post you report some strange timing problems with no simple explanation.  When I get such things I suspect database corruption or hardware problems and run an integrity_check.  But with a 60Gig database I might think twice.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Very, very slow commits - Possibly solved

Hick Gunter
In reply to this post by Rob Willett
Based on the currently available information I woudl suggest the following schema:

CREATE TABLE IF NOT EXISTS "Disruptions" (
         "id" INTEGER PRIMARY KEY NOT NULL AUTOINCREMENT,
         "version" integer NOT NULL,
         "Disruption_id" INTEGER NOT NULL,
         "status" integer NOT NULL,
         "category" integer NOT NULL,
         "subCategory" integer NOT NULL,
         "location" integer NOT NULL,
         "levelOfInterest" integer NOT NULL,
         "corridor" integer NOT NULL,
         "severity" integer NOT NULL,
         "startTime" TEXT NOT NULL COLLATE NOCASE,
         "comments" integer NOT NULL,
         "currentUpdate" integer NOT NULL,
         "CauseAreaPointX" real NOT NULL,
         "CauseAreaPointY" real NOT NULL,
         "endTime" text NOT NULL COLLATE NOCASE,
         "remarkTime" TEXT NOT NULL COLLATE NOCASE,
         "lastModTime" TEXT NOT NULL COLLATE NOCASE,
         "Direction" TEXT COLLATE NOCASE
);

CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location","corridor","status","category","severity","levelOfInterest","version","subCategory","startTime");

CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id","version","category","subCategory");

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version","Disruption_id","location");

CREATE INDEX Disruptions_Idx5 ON Disruptions ("status","Disruption_id","Severity","levelOfInterest","category","subCategory","version");

Remarks:

COLLATE is only required for text values. If you always store data of the declared type, COLLATE has no business with a non-text column.

I have reordered the fields so that fields used in one or more indexes are at the front of the record. This allows SQLite to quit decoding the record faster when building index strings. I have assumed that each index is used/updated equally often; if you have an index that is used most, reordering the fields may help processing speed.

Non-index fields should be ordered so that fields that feature prominently in the retrieved data come first; again, this will allow SQLite to quit decoding the record earlier when reading data.

It may also improve performance to create a "covering index" by adding the (one or two) fields retrieved to the index used for locating the record. This allows SQLite to retrieve these fields directly from the index BTree without referring back to the table BTree.

I assume the order of the fields of each index matches the order of the fields in the ORDER BY clause(s) of the queries that use the respective index.

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Rob Willett
Gesendet: Dienstag, 31. Juli 2018 16:31
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] Re: [sqlite] Very, very slow commits - Possibly solved

Dear all,

We think we have now found the issue with the slow commits.

We believe this is due to an inherent (and old) defect in our database design. We think our original design has an implicit ordering of rows in a table, when the table is only increasing this flaw in the design isn't apparent.

However when we started deduping the table AND we copied rows from one table to another to move things around, we changed the underlying order of rows. Sqlite handles the design change BUT the flaw in our design becomes apparent as we keep moving the data around and data gets mixed up. The database slows down when we create a second table with an identical structure to the first table, copy the data into the new table, drop the old and then when we rename the old table to the new table, things appear to slow down. Logically speaking SQLite shouldn't notice the difference in row order, but things do slow down, even with analyse.

We think that a better index definition could solve the problem for us, a better database design would, but thats a tricky problem.

We're now going back to our 60GB database and start from scratch to see if we can create the issue (now we think we know what it is).

Thanks to everybody who contributed ideas, we appreciate the help.

Rob

On 31 Jul 2018, at 15:19, Rob Willett wrote:

> Simon,
>
> As an exercise we have just added in COLLATE NOCASE to our integer
> columns.
>
> Whoops! We thought this would make no difference but its added extra
> 70% to our processing speeds.
>
> We've now got to the stage where we can make changes quickly, so we'll
> back that change out and go back to the integer defn without COLLATE
> NOCASE.
>
> Rob
>
> On 31 Jul 2018, at 14:59, Rob Willett wrote:
>
>> Simon,
>>
>> Apologies for taking so long to get back, we've been building a test
>> system and its taken a long time.
>>
>> We're just getting round to trying your ideas out to see what
>> difference they make,
>>
>> We've created a new table based on your ideas, moved the collate into
>> the table, analysed the database. We did **not** add COLLATE NOCASE
>> to the columns which are defined as integers. Would that make a
>> difference?
>>
>> We've found it now takes around 10% longer to do the queries than
>> before.
>>
>> Rob
>>
>>
>>> Please try moving your COLLATE clauses into the table definition.
>>> e.g. instead of
>>>
>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>> COLLATE NOCASE ASC);
>>>
>>> Your table definition should have
>>>
>>>      "version" integer NOT NULL COLLATE NOCASE,
>>>      "Disruption_id" INTEGER NOT NULL COLLATE NOCASE, ...
>>>      "location" integer NOT NULL COLLATE NOCASE,
>>>
>>> and the index should be
>>>
>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>>>
>>> Once data has been entered, do ANALYZE.  This step may take a long
>>> time.
>>>
>>> Simon.
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits - Possibly solved

Rob Willett
In reply to this post by Chris Locke-3
Chris,

I'll try and summarise.

1. We have a 60GB database collecting data. This database is accessed by
a single process once every five mins and around 5MB of data (approx
600-800 rows) is added. Data has never been deleted.

2. The database is getting too big for the server it's hosted on. We're
struggling to back it up, or do much with it as its hosted on a Virtual
Private Server.

3. We took a long hard look at the database a few months ago and tried
to work out what we could do. When we designed the database we weren't
completely sure what data we would need so we went overboard and stored
a lot of data, hence the database is growing.

4. We realised that a lot of the data is redundant, not in a normalised
database form of redundancy but is data we don't actually need now. We
thought we did, but our expectations are now different. Most of the data
is held in a single table which is currently 200M rows long.

5. We worked out we could remove approx 99% of the data and everything
that we currently do *should* work as before. The work we have been
discussing in this thread is our testing of this reduction or
de-duplication work. Currently the production system is untouched and
works well and is performant.

6. The work to reduce the main table has been difficult as the table is
so large AND we are using a Virtual Private Server which has IO
limitations as its based on OpenVZ. The supplier doesn't want us
consuming all the available resources.

7. We developed a couple of techniques for trying to speed up the
reduction of the main database table. Rather than removing rows from the
table, we copied out the required rows to a new identical table but we
only needed to copy out approx 500,000 rows as opposed to 200,000,000.
We then discovered that dropping a 200M row table on a VPS server is
slow. Circa 10 hours. On a new home built and large server it's a few
minutes. We only found this out late in the process.

8. Once we constructed the new table and new database (600Mb now rather
than 60GB) we started testing it on a test server. This is a smaller
version of the main production server, e.g. it has two cores rather than
eight, 2GB rather than 8GB. Both the servers use a RAID array of
spinning rust at the back end. We as customers have no idea what this
array is.

9. After some various tests, we noticed that the database seemed to be
slowing down, especially around the commit statement. It was taking
around 7 secs to commit what should be a tiny amount of data (5MB). The
average work we do in a process is off the database parsing and
processing an XML file. The database actions we do are normally a simple
insert to add rows to the main table with very occasional updates of
other tables.

10. We then built a large server in our office under ESXI to replicate
the production server and to try and move the work closer to us, so we
could try and see what the problem is. This local server is faster than
our production server BUT it doesn't have the network connections,
redundancy and other features we need for production.  We tried to
replicate the steps we did last week to see if we could reproduce the
problem. We used the technique of copying to a new table, dropping the
200M row table and catering the name of the table back as the technique
to use. We have other techniques which involves working with the 200M
row table in-situ but this technique seemed to be faster on our VPS
server. On our home built server, we think that working with the table
as-is would be faster.

11. We worked through our steps one by one to reproduce our smaller
database. We vacuumed and analysed the database and then copied it back
to a test server back on our VPS estate.

12. We benchmarked the database in the test VPS server and got around
9-10 secs per run. As this is a test server it's significantly slower
than our prod server but its a baseline we can work with. We send
through 25 iterations of data to get the baseline.

13. We then started 'playing' about with indexes, creating them with
different collations, creating tables with collations, including integer
collations which we think should be cost neutral, as we copyied data
from table to table to try and see what happened, we noticed that the
speed significantly changed from 10 secs to around 16-18 secs. As far as
we could see this was due to simply moving the data around. We always
created the 'right' schema to copy into and didn't allow SQLite to work
out the types. We ran analyse and vacuum on the data after moving
tables. We also created and recreated indexes as needed.

14. We think that the constant moving of data around between tables is
fragmenting tables and indexes on the disk and so when we add new rows
to the vacuumed table we are adding them to all over the place so that
commits are taking longer and longer. There was also a discussion that
SSD's may mean that we are constantly getting file misses from the OS
cache. I've yet to validate that theory. It could also be that something
we do in messing with the sqlite_sequence table and that data is being
inserted into holes somewhere.

15. We also have looked at an older piece of code and we *think* it
makes an assumption that data is held in contiguous rows (or it could be
that the query is poorly written and that we need to look at the
indexes). The code isn't so obvious as to look at row_ids and work with
that, but its a hint that were still chasing down.

16. We did notice that running analyse actually made the database
perform worse than before. This was due to it using a specific index
before the analyse and then afterwards it used an automatic covering
index. We then created a real index to get it working correctly again.

17. We're now going back to our 60GB database to try and work through
the whole process again to see if we can confirm our hypothesis.

Rob

On 31 Jul 2018, at 15:40, Chris Locke wrote:

> I've been following this thread with interest, but this just doesn't
> make
> sense...
>
>>  Logically speaking SQLite shouldn't notice the difference in row
>> order,
> but things do slow down,
>> even with analyse.
>
> Are you accessing each row via its ID?  Even so, that should still be
> indexed.
> I thought you were simply adding records into the database - I'm
> failing to
> grasp how this is slowing down in the new database.
>
>
> Thanks,
> Chris
>
>
>
> On Tue, Jul 31, 2018 at 3:30 PM Rob Willett
> <[hidden email]>
> wrote:
>
>> Dear all,
>>
>> We think we have now found the issue with the slow commits.
>>
>> We believe this is due to an inherent (and old) defect in our
>> database
>> design. We think our original design has an implicit ordering of rows
>> in
>> a table, when the table is only increasing this flaw in the design
>> isn't
>> apparent.
>>
>> However when we started deduping the table AND we copied rows from
>> one
>> table to another to move things around, we changed the underlying
>> order
>> of rows. Sqlite handles the design change BUT the flaw in our design
>> becomes apparent as we keep moving the data around and data gets
>> mixed
>> up. The database slows down when we create a second table with an
>> identical structure to the first table, copy the data into the new
>> table, drop the old and then when we rename the old table to the new
>> table, things appear to slow down. Logically speaking SQLite
>> shouldn't
>> notice the difference in row order, but things do slow down, even
>> with
>> analyse.
>>
>> We think that a better index definition could solve the problem for
>> us,
>> a better database design would, but thats a tricky problem.
>>
>> We're now going back to our 60GB database and start from scratch to
>> see
>> if we can create the issue (now we think we know what it is).
>>
>> Thanks to everybody who contributed ideas, we appreciate the help.
>>
>> Rob
>>
>> On 31 Jul 2018, at 15:19, Rob Willett wrote:
>>
>>> Simon,
>>>
>>> As an exercise we have just added in COLLATE NOCASE to our integer
>>> columns.
>>>
>>> Whoops! We thought this would make no difference but its added extra
>>> 70% to our processing speeds.
>>>
>>> We've now got to the stage where we can make changes quickly, so
>>> we'll
>>> back that change out and go back to the integer defn without COLLATE
>>> NOCASE.
>>>
>>> Rob
>>>
>>> On 31 Jul 2018, at 14:59, Rob Willett wrote:
>>>
>>>> Simon,
>>>>
>>>> Apologies for taking so long to get back, we've been building a
>>>> test
>>>> system and its taken a long time.
>>>>
>>>> We're just getting round to trying your ideas out to see what
>>>> difference they make,
>>>>
>>>> We've created a new table based on your ideas, moved the collate
>>>> into
>>>> the table, analysed the database. We did **not** add COLLATE NOCASE
>>>> to the columns which are defined as integers. Would that make a
>>>> difference?
>>>>
>>>> We've found it now takes around 10% longer to do the queries than
>>>> before.
>>>>
>>>> Rob
>>>>
>>>>
>>>>> Please try moving your COLLATE clauses into the table definition.
>>>>> e.g. instead of
>>>>>
>>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>>> ("version"
>>>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>>>> COLLATE NOCASE ASC);
>>>>>
>>>>> Your table definition should have
>>>>>
>>>>>      "version" integer NOT NULL COLLATE NOCASE,
>>>>>      "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
>>>>> ...
>>>>>      "location" integer NOT NULL COLLATE NOCASE,
>>>>>
>>>>> and the index should be
>>>>>
>>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>>>>>
>>>>> Once data has been entered, do ANALYZE.  This step may take a long
>>>>> time.
>>>>>
>>>>> Simon.
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> [hidden email]
>>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Very, very slow commits

Rob Willett
In reply to this post by Simon Slavin-3
Simon,

Absolutely no need to apologise. We should apologise for all the time we
have taken from other people :(

We recognise that the collate no case is inappropriate for our database.
We suspect this was added from a SQLite tool we used some time ago. We
are going to use this opportunity to remove this sort of nonsense.

We still think we have an inherent design fault in our database that we
are trying to understand. One of the problems we had was that checking
the performance of a 200M row table for bad indexes is time consuming.
We now have a workflow to get from a 60GB database to a 600MB database
in a few hours. We cannot do all the work in SQL as it involves an
external program to analyse the data but a few hours to run isn't bad.
As we now have the database held locally, we can thrash the local server
silly to get the performance we need.

Rob

On 31 Jul 2018, at 16:18, Simon Slavin wrote:

> On 31 Jul 2018, at 2:59pm, Rob Willett <[hidden email]>
> wrote:
>
>> We've created a new table based on your ideas, moved the collate into
>> the table, analysed the database. We did **not** add COLLATE NOCASE
>> to the columns which are defined as integers. Would that make a
>> difference?
>
> What you did is correct.  I gave wrong advice for which I apologise.  
> But I am now confused since your original code is a little strange.  
> Your original has a table definition including
>
> "version" integer NOT NULL,
>
> but then
>
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
> ("Disruption_id" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> "category" COLLATE NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
>
> If "version" really is an INTEGER, then it is incorrect to use version
> COLLATE NOCASE in the index.  NOCASE is purely for text values.  This
> may be slowing things down.
>
> To solve it, in the table definition, use COLLATE NOCASE for TEXT
> columns and not for INTEGER columns.  Also, remove all mentions of
> COLLATE NOCASE in your index definitions.  Collation methods should be
> set in the table definition, not in indexes, except for some unusual
> situations.
>
> This should increase your speed relative to your original timings.  If
> it slows things down, something else I haven't spotted is wrong.
>
>> We've found it now takes around 10% longer to do the queries than
>> before.
>
> That is understandable given the incorrect advice I gave you before.
>
> In another post you report some strange timing problems with no simple
> explanation.  When I get such things I suspect database corruption or
> hardware problems and run an integrity_check.  But with a 60Gig
> database I might think twice.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Very, very slow commits - Possibly solved

Rob Willett
In reply to this post by Hick Gunter
Gunter,

Thanks for this. We have already started on this as we realised that the
COLLATE NOCASE was irrelevant and actually slowed down integer queries.

What we have not done is reorder the table to match the index queries.
This had not occurred to us.

We think we have already created the indexes so that we 'cover' the
queries. One advantage (!) of a 200M row table is that inefficient
queries are immediately obvious as we have to do a table scan.

We have not checked that the order of columns in the index match the
ORDER BY clauses. We never thought of that either,

Thanks for the help.

Rob

On 31 Jul 2018, at 16:21, Hick Gunter wrote:

> Based on the currently available information I woudl suggest the
> following schema:
>
> CREATE TABLE IF NOT EXISTS "Disruptions" (
>          "id" INTEGER PRIMARY KEY NOT NULL AUTOINCREMENT,
>          "version" integer NOT NULL,
>          "Disruption_id" INTEGER NOT NULL,
>          "status" integer NOT NULL,
>          "category" integer NOT NULL,
>          "subCategory" integer NOT NULL,
>          "location" integer NOT NULL,
>          "levelOfInterest" integer NOT NULL,
>          "corridor" integer NOT NULL,
>          "severity" integer NOT NULL,
>          "startTime" TEXT NOT NULL COLLATE NOCASE,
>          "comments" integer NOT NULL,
>          "currentUpdate" integer NOT NULL,
>          "CauseAreaPointX" real NOT NULL,
>          "CauseAreaPointY" real NOT NULL,
>          "endTime" text NOT NULL COLLATE NOCASE,
>          "remarkTime" TEXT NOT NULL COLLATE NOCASE,
>          "lastModTime" TEXT NOT NULL COLLATE NOCASE,
>          "Direction" TEXT COLLATE NOCASE
> );
>
> CREATE INDEX "Disruptions_Idx1" ON Disruptions
> ("location","corridor","status","category","severity","levelOfInterest","version","subCategory","startTime");
>
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
> ("Disruption_id","version","category","subCategory");
>
> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
> ("version","Disruption_id","location");
>
> CREATE INDEX Disruptions_Idx5 ON Disruptions
> ("status","Disruption_id","Severity","levelOfInterest","category","subCategory","version");
>
> Remarks:
>
> COLLATE is only required for text values. If you always store data of
> the declared type, COLLATE has no business with a non-text column.
>
> I have reordered the fields so that fields used in one or more indexes
> are at the front of the record. This allows SQLite to quit decoding
> the record faster when building index strings. I have assumed that
> each index is used/updated equally often; if you have an index that is
> used most, reordering the fields may help processing speed.
>
> Non-index fields should be ordered so that fields that feature
> prominently in the retrieved data come first; again, this will allow
> SQLite to quit decoding the record earlier when reading data.
>
> It may also improve performance to create a "covering index" by adding
> the (one or two) fields retrieved to the index used for locating the
> record. This allows SQLite to retrieve these fields directly from the
> index BTree without referring back to the table BTree.
>
> I assume the order of the fields of each index matches the order of
> the fields in the ORDER BY clause(s) of the queries that use the
> respective index.
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users
> [mailto:[hidden email]] Im Auftrag von
> Rob Willett
> Gesendet: Dienstag, 31. Juli 2018 16:31
> An: SQLite mailing list <[hidden email]>
> Betreff: [EXTERNAL] Re: [sqlite] Very, very slow commits - Possibly
> solved
>
> Dear all,
>
> We think we have now found the issue with the slow commits.
>
> We believe this is due to an inherent (and old) defect in our database
> design. We think our original design has an implicit ordering of rows
> in a table, when the table is only increasing this flaw in the design
> isn't apparent.
>
> However when we started deduping the table AND we copied rows from one
> table to another to move things around, we changed the underlying
> order of rows. Sqlite handles the design change BUT the flaw in our
> design becomes apparent as we keep moving the data around and data
> gets mixed up. The database slows down when we create a second table
> with an identical structure to the first table, copy the data into the
> new table, drop the old and then when we rename the old table to the
> new table, things appear to slow down. Logically speaking SQLite
> shouldn't notice the difference in row order, but things do slow down,
> even with analyse.
>
> We think that a better index definition could solve the problem for
> us, a better database design would, but thats a tricky problem.
>
> We're now going back to our 60GB database and start from scratch to
> see if we can create the issue (now we think we know what it is).
>
> Thanks to everybody who contributed ideas, we appreciate the help.
>
> Rob
>
> On 31 Jul 2018, at 15:19, Rob Willett wrote:
>
>> Simon,
>>
>> As an exercise we have just added in COLLATE NOCASE to our integer
>> columns.
>>
>> Whoops! We thought this would make no difference but its added extra
>> 70% to our processing speeds.
>>
>> We've now got to the stage where we can make changes quickly, so
>> we'll
>> back that change out and go back to the integer defn without COLLATE
>> NOCASE.
>>
>> Rob
>>
>> On 31 Jul 2018, at 14:59, Rob Willett wrote:
>>
>>> Simon,
>>>
>>> Apologies for taking so long to get back, we've been building a test
>>> system and its taken a long time.
>>>
>>> We're just getting round to trying your ideas out to see what
>>> difference they make,
>>>
>>> We've created a new table based on your ideas, moved the collate
>>> into
>>> the table, analysed the database. We did **not** add COLLATE NOCASE
>>> to the columns which are defined as integers. Would that make a
>>> difference?
>>>
>>> We've found it now takes around 10% longer to do the queries than
>>> before.
>>>
>>> Rob
>>>
>>>
>>>> Please try moving your COLLATE clauses into the table definition.
>>>> e.g. instead of
>>>>
>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>> ("version"
>>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>>> COLLATE NOCASE ASC);
>>>>
>>>> Your table definition should have
>>>>
>>>>      "version" integer NOT NULL COLLATE NOCASE,
>>>>      "Disruption_id" INTEGER NOT NULL COLLATE NOCASE, ...
>>>>      "location" integer NOT NULL COLLATE NOCASE,
>>>>
>>>> and the index should be
>>>>
>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>>>>
>>>> Once data has been entered, do ANALYZE.  This step may take a long
>>>> time.
>>>>
>>>> Simon.
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>  Gunter Hick | Software Engineer | Scientific Games International GmbH
> | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013
> | (O) +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the
> addressee.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Very, very slow commits - Possibly solved

Simon Slavin-3
On 31 Jul 2018, at 4:42pm, Rob Willett <[hidden email]> wrote:

> We have not checked that the order of columns in the index match the ORDER BY clauses. We never thought of that either,

That is going to make a big difference.  Well done Gunter.  Rob: don't forget that once you have your indexes defined and data in the tables, do an ANALYZE.

For those who don't know, columns of an index should match first the columns mentioned in the WHERE clause, then the columns in the ORDER BY clause, in the order they are mentioned.  For instance, except for very unusual data,

SELECT * FROM MyTable WHERE y < 4 AND l = 16 ORDER BY a, t, r

perfect index: (l, y, a, t, r)

  less useful: (l, y, r, a, t)

  less useful: (y, l, a, t, r)

of little use: (a, t, r, l, y)

Any index which starts with anything except l is almost useless, even though it mentions every column needed.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Very, very slow commits - Possibly solved

David Raymond
In reply to this post by Rob Willett
Take a look at the queries being run and do an "explain query plan" for each and look at the output to see what it decides to use. Look for "SCAN TABLE" cases that might benefit, or other oddities where it's doing anything you're not expecting.


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Rob Willett
Sent: Tuesday, July 31, 2018 11:42 AM
To: SQLite mailing list
Subject: Re: [sqlite] [EXTERNAL] Re: Very, very slow commits - Possibly solved

Gunter,

Thanks for this. We have already started on this as we realised that the
COLLATE NOCASE was irrelevant and actually slowed down integer queries.

What we have not done is reorder the table to match the index queries.
This had not occurred to us.

We think we have already created the indexes so that we 'cover' the
queries. One advantage (!) of a 200M row table is that inefficient
queries are immediately obvious as we have to do a table scan.

We have not checked that the order of columns in the index match the
ORDER BY clauses. We never thought of that either,

Thanks for the help.

Rob

On 31 Jul 2018, at 16:21, Hick Gunter wrote:

> Based on the currently available information I woudl suggest the
> following schema:
>
> CREATE TABLE IF NOT EXISTS "Disruptions" (
>          "id" INTEGER PRIMARY KEY NOT NULL AUTOINCREMENT,
>          "version" integer NOT NULL,
>          "Disruption_id" INTEGER NOT NULL,
>          "status" integer NOT NULL,
>          "category" integer NOT NULL,
>          "subCategory" integer NOT NULL,
>          "location" integer NOT NULL,
>          "levelOfInterest" integer NOT NULL,
>          "corridor" integer NOT NULL,
>          "severity" integer NOT NULL,
>          "startTime" TEXT NOT NULL COLLATE NOCASE,
>          "comments" integer NOT NULL,
>          "currentUpdate" integer NOT NULL,
>          "CauseAreaPointX" real NOT NULL,
>          "CauseAreaPointY" real NOT NULL,
>          "endTime" text NOT NULL COLLATE NOCASE,
>          "remarkTime" TEXT NOT NULL COLLATE NOCASE,
>          "lastModTime" TEXT NOT NULL COLLATE NOCASE,
>          "Direction" TEXT COLLATE NOCASE
> );
>
> CREATE INDEX "Disruptions_Idx1" ON Disruptions
> ("location","corridor","status","category","severity","levelOfInterest","version","subCategory","startTime");
>
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
> ("Disruption_id","version","category","subCategory");
>
> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
> ("version","Disruption_id","location");
>
> CREATE INDEX Disruptions_Idx5 ON Disruptions
> ("status","Disruption_id","Severity","levelOfInterest","category","subCategory","version");
>
> Remarks:
>
> COLLATE is only required for text values. If you always store data of
> the declared type, COLLATE has no business with a non-text column.
>
> I have reordered the fields so that fields used in one or more indexes
> are at the front of the record. This allows SQLite to quit decoding
> the record faster when building index strings. I have assumed that
> each index is used/updated equally often; if you have an index that is
> used most, reordering the fields may help processing speed.
>
> Non-index fields should be ordered so that fields that feature
> prominently in the retrieved data come first; again, this will allow
> SQLite to quit decoding the record earlier when reading data.
>
> It may also improve performance to create a "covering index" by adding
> the (one or two) fields retrieved to the index used for locating the
> record. This allows SQLite to retrieve these fields directly from the
> index BTree without referring back to the table BTree.
>
> I assume the order of the fields of each index matches the order of
> the fields in the ORDER BY clause(s) of the queries that use the
> respective index.
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users
> [mailto:[hidden email]] Im Auftrag von
> Rob Willett
> Gesendet: Dienstag, 31. Juli 2018 16:31
> An: SQLite mailing list <[hidden email]>
> Betreff: [EXTERNAL] Re: [sqlite] Very, very slow commits - Possibly
> solved
>
> Dear all,
>
> We think we have now found the issue with the slow commits.
>
> We believe this is due to an inherent (and old) defect in our database
> design. We think our original design has an implicit ordering of rows
> in a table, when the table is only increasing this flaw in the design
> isn't apparent.
>
> However when we started deduping the table AND we copied rows from one
> table to another to move things around, we changed the underlying
> order of rows. Sqlite handles the design change BUT the flaw in our
> design becomes apparent as we keep moving the data around and data
> gets mixed up. The database slows down when we create a second table
> with an identical structure to the first table, copy the data into the
> new table, drop the old and then when we rename the old table to the
> new table, things appear to slow down. Logically speaking SQLite
> shouldn't notice the difference in row order, but things do slow down,
> even with analyse.
>
> We think that a better index definition could solve the problem for
> us, a better database design would, but thats a tricky problem.
>
> We're now going back to our 60GB database and start from scratch to
> see if we can create the issue (now we think we know what it is).
>
> Thanks to everybody who contributed ideas, we appreciate the help.
>
> Rob
>
> On 31 Jul 2018, at 15:19, Rob Willett wrote:
>
>> Simon,
>>
>> As an exercise we have just added in COLLATE NOCASE to our integer
>> columns.
>>
>> Whoops! We thought this would make no difference but its added extra
>> 70% to our processing speeds.
>>
>> We've now got to the stage where we can make changes quickly, so
>> we'll
>> back that change out and go back to the integer defn without COLLATE
>> NOCASE.
>>
>> Rob
>>
>> On 31 Jul 2018, at 14:59, Rob Willett wrote:
>>
>>> Simon,
>>>
>>> Apologies for taking so long to get back, we've been building a test
>>> system and its taken a long time.
>>>
>>> We're just getting round to trying your ideas out to see what
>>> difference they make,
>>>
>>> We've created a new table based on your ideas, moved the collate
>>> into
>>> the table, analysed the database. We did **not** add COLLATE NOCASE
>>> to the columns which are defined as integers. Would that make a
>>> difference?
>>>
>>> We've found it now takes around 10% longer to do the queries than
>>> before.
>>>
>>> Rob
>>>
>>>
>>>> Please try moving your COLLATE clauses into the table definition.
>>>> e.g. instead of
>>>>
>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>> ("version"
>>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>>> COLLATE NOCASE ASC);
>>>>
>>>> Your table definition should have
>>>>
>>>>      "version" integer NOT NULL COLLATE NOCASE,
>>>>      "Disruption_id" INTEGER NOT NULL COLLATE NOCASE, ...
>>>>      "location" integer NOT NULL COLLATE NOCASE,
>>>>
>>>> and the index should be
>>>>
>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>>>>
>>>> Once data has been entered, do ANALYZE.  This step may take a long
>>>> time.
>>>>
>>>> Simon.
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>  Gunter Hick | Software Engineer | Scientific Games International GmbH
> | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013
> | (O) +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the
> addressee.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Very, very slow commits - Possibly solved

Rob Willett
David,

We've been through that level of detail. Thats how we found that after
an Analyse that a new automatic covering index was being created for a
query that was not needed before the Analyse. That puzzled us and still
does.

Rob

On 31 Jul 2018, at 16:58, David Raymond wrote:

> Take a look at the queries being run and do an "explain query plan"
> for each and look at the output to see what it decides to use. Look
> for "SCAN TABLE" cases that might benefit, or other oddities where
> it's doing anything you're not expecting.
>
>
> -----Original Message-----
> From: sqlite-users
> [mailto:[hidden email]] On Behalf Of Rob
> Willett
> Sent: Tuesday, July 31, 2018 11:42 AM
> To: SQLite mailing list
> Subject: Re: [sqlite] [EXTERNAL] Re: Very, very slow commits -
> Possibly solved
>
> Gunter,
>
> Thanks for this. We have already started on this as we realised that
> the
> COLLATE NOCASE was irrelevant and actually slowed down integer
> queries.
>
> What we have not done is reorder the table to match the index queries.
> This had not occurred to us.
>
> We think we have already created the indexes so that we 'cover' the
> queries. One advantage (!) of a 200M row table is that inefficient
> queries are immediately obvious as we have to do a table scan.
>
> We have not checked that the order of columns in the index match the
> ORDER BY clauses. We never thought of that either,
>
> Thanks for the help.
>
> Rob
>
> On 31 Jul 2018, at 16:21, Hick Gunter wrote:
>
>> Based on the currently available information I woudl suggest the
>> following schema:
>>
>> CREATE TABLE IF NOT EXISTS "Disruptions" (
>>          "id" INTEGER PRIMARY KEY NOT NULL AUTOINCREMENT,
>>          "version" integer NOT NULL,
>>          "Disruption_id" INTEGER NOT NULL,
>>          "status" integer NOT NULL,
>>          "category" integer NOT NULL,
>>          "subCategory" integer NOT NULL,
>>          "location" integer NOT NULL,
>>          "levelOfInterest" integer NOT NULL,
>>          "corridor" integer NOT NULL,
>>          "severity" integer NOT NULL,
>>          "startTime" TEXT NOT NULL COLLATE NOCASE,
>>          "comments" integer NOT NULL,
>>          "currentUpdate" integer NOT NULL,
>>          "CauseAreaPointX" real NOT NULL,
>>          "CauseAreaPointY" real NOT NULL,
>>          "endTime" text NOT NULL COLLATE NOCASE,
>>          "remarkTime" TEXT NOT NULL COLLATE NOCASE,
>>          "lastModTime" TEXT NOT NULL COLLATE NOCASE,
>>          "Direction" TEXT COLLATE NOCASE
>> );
>>
>> CREATE INDEX "Disruptions_Idx1" ON Disruptions
>> ("location","corridor","status","category","severity","levelOfInterest","version","subCategory","startTime");
>>
>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
>> ("Disruption_id","version","category","subCategory");
>>
>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>> ("version","Disruption_id","location");
>>
>> CREATE INDEX Disruptions_Idx5 ON Disruptions
>> ("status","Disruption_id","Severity","levelOfInterest","category","subCategory","version");
>>
>> Remarks:
>>
>> COLLATE is only required for text values. If you always store data of
>> the declared type, COLLATE has no business with a non-text column.
>>
>> I have reordered the fields so that fields used in one or more
>> indexes
>> are at the front of the record. This allows SQLite to quit decoding
>> the record faster when building index strings. I have assumed that
>> each index is used/updated equally often; if you have an index that
>> is
>> used most, reordering the fields may help processing speed.
>>
>> Non-index fields should be ordered so that fields that feature
>> prominently in the retrieved data come first; again, this will allow
>> SQLite to quit decoding the record earlier when reading data.
>>
>> It may also improve performance to create a "covering index" by
>> adding
>> the (one or two) fields retrieved to the index used for locating the
>> record. This allows SQLite to retrieve these fields directly from the
>> index BTree without referring back to the table BTree.
>>
>> I assume the order of the fields of each index matches the order of
>> the fields in the ORDER BY clause(s) of the queries that use the
>> respective index.
>>
>> -----Ursprüngliche Nachricht-----
>> Von: sqlite-users
>> [mailto:[hidden email]] Im Auftrag von
>> Rob Willett
>> Gesendet: Dienstag, 31. Juli 2018 16:31
>> An: SQLite mailing list <[hidden email]>
>> Betreff: [EXTERNAL] Re: [sqlite] Very, very slow commits - Possibly
>> solved
>>
>> Dear all,
>>
>> We think we have now found the issue with the slow commits.
>>
>> We believe this is due to an inherent (and old) defect in our
>> database
>> design. We think our original design has an implicit ordering of rows
>> in a table, when the table is only increasing this flaw in the design
>> isn't apparent.
>>
>> However when we started deduping the table AND we copied rows from
>> one
>> table to another to move things around, we changed the underlying
>> order of rows. Sqlite handles the design change BUT the flaw in our
>> design becomes apparent as we keep moving the data around and data
>> gets mixed up. The database slows down when we create a second table
>> with an identical structure to the first table, copy the data into
>> the
>> new table, drop the old and then when we rename the old table to the
>> new table, things appear to slow down. Logically speaking SQLite
>> shouldn't notice the difference in row order, but things do slow
>> down,
>> even with analyse.
>>
>> We think that a better index definition could solve the problem for
>> us, a better database design would, but thats a tricky problem.
>>
>> We're now going back to our 60GB database and start from scratch to
>> see if we can create the issue (now we think we know what it is).
>>
>> Thanks to everybody who contributed ideas, we appreciate the help.
>>
>> Rob
>>
>> On 31 Jul 2018, at 15:19, Rob Willett wrote:
>>
>>> Simon,
>>>
>>> As an exercise we have just added in COLLATE NOCASE to our integer
>>> columns.
>>>
>>> Whoops! We thought this would make no difference but its added extra
>>> 70% to our processing speeds.
>>>
>>> We've now got to the stage where we can make changes quickly, so
>>> we'll
>>> back that change out and go back to the integer defn without COLLATE
>>> NOCASE.
>>>
>>> Rob
>>>
>>> On 31 Jul 2018, at 14:59, Rob Willett wrote:
>>>
>>>> Simon,
>>>>
>>>> Apologies for taking so long to get back, we've been building a
>>>> test
>>>> system and its taken a long time.
>>>>
>>>> We're just getting round to trying your ideas out to see what
>>>> difference they make,
>>>>
>>>> We've created a new table based on your ideas, moved the collate
>>>> into
>>>> the table, analysed the database. We did **not** add COLLATE NOCASE
>>>> to the columns which are defined as integers. Would that make a
>>>> difference?
>>>>
>>>> We've found it now takes around 10% longer to do the queries than
>>>> before.
>>>>
>>>> Rob
>>>>
>>>>
>>>>> Please try moving your COLLATE clauses into the table definition.
>>>>> e.g. instead of
>>>>>
>>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>>> ("version"
>>>>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
>>>>> COLLATE NOCASE ASC);
>>>>>
>>>>> Your table definition should have
>>>>>
>>>>>      "version" integer NOT NULL COLLATE NOCASE,
>>>>>      "Disruption_id" INTEGER NOT NULL COLLATE NOCASE, ...
>>>>>      "location" integer NOT NULL COLLATE NOCASE,
>>>>>
>>>>> and the index should be
>>>>>
>>>>>     CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
>>>>>         ("version" ASC, "Disruption_id" ASC, "location" ASC);
>>>>>
>>>>> Once data has been entered, do ANALYZE.  This step may take a long
>>>>> time.
>>>>>
>>>>> Simon.
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> [hidden email]
>>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> ___________________________________________
>>  Gunter Hick | Software Engineer | Scientific Games International
>> GmbH
>> | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR:
>> 0430013
>> | (O) +43 1 80100 - 0
>>
>> May be privileged. May be confidential. Please delete if not the
>> addressee.
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12