Any ideas on how long it takes to drop a large table...rather an important question now...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
Hi,

We're doing a massive tidy on our database which is approx 50GB.

One table is approx 49GB of that 50GB which we need to delete as we have
recorded the information in a far better format thats approx 99% more
efficient. If only we had been this clever when we started....

We've just 'dropped' the table and were assuming that dropping the table
would be quite quick. It's not. So far we've been waiting for 30 mins
and nothing has come back yet. We can see that the -wal file is upto
2.5GB. We have this terrible feeling that it'll need to get to 49GB or
so before the table gets dropped. We can just about handle that in the
current filesystem.

We're now getting nervous about dropping this table. We had assumed that
it would be a really quick and easy operation based on absolutely no
checking whatsoever. When we looked on line all we could see was a
reference to a very, very old and outdated page
(https://sqlite.org/speed.html) which talks about speed and at the
bottom of that page the comments

"SQLite is slower than the other databases when it comes to dropping
tables. This probably is because when SQLite drops a table, it has to go
through and erase the records in the database file that deal with that
table. MySQL and PostgreSQL, on the other hand, use separate files to
represent each table so they can drop a table simply by deleting a file,
which is much faster.

On the other hand, dropping tables is not a very common operation so if
SQLite takes a little longer, that is not seen as a big problem."

Is this still the case, is it going to take a long time? If we assume
that the table is 49GB then will we need to wait until the -wal file is
at 49GB. By our estimates thats approximately 10 hours away.

Any help or idea or suggestions welcomed, but please be quick.

Rob
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Dan Kennedy-4
On 07/17/2018 03:01 AM, [hidden email] wrote:

> Hi,
>
> We're doing a massive tidy on our database which is approx 50GB.
>
> One table is approx 49GB of that 50GB which we need to delete as we have
> recorded the information in a far better format thats approx 99% more
> efficient. If only we had been this clever when we started....
>
> We've just 'dropped' the table and were assuming that dropping the table
> would be quite quick. It's not. So far we've been waiting for 30 mins
> and nothing has come back yet. We can see that the -wal file is upto
> 2.5GB. We have this terrible feeling that it'll need to get to 49GB or
> so before the table gets dropped. We can just about handle that in the
> current filesystem.
>
> We're now getting nervous about dropping this table. We had assumed that
> it would be a really quick and easy operation based on absolutely no
> checking whatsoever. When we looked on line all we could see was a
> reference to a very, very old and outdated page
> (https://sqlite.org/speed.html) which talks about speed and at the
> bottom of that page the comments
>
> "SQLite is slower than the other databases when it comes to dropping
> tables. This probably is because when SQLite drops a table, it has to go
> through and erase the records in the database file that deal with that
> table. MySQL and PostgreSQL, on the other hand, use separate files to
> represent each table so they can drop a table simply by deleting a file,
> which is much faster.
>
> On the other hand, dropping tables is not a very common operation so if
> SQLite takes a little longer, that is not seen as a big problem."
>
> Is this still the case, is it going to take a long time? If we assume
> that the table is 49GB then will we need to wait until the -wal file is
> at 49GB. By our estimates thats approximately 10 hours away.
>
> Any help or idea or suggestions welcomed, but please be quick.

Is secure-delete turned on?

   https://www.sqlite.org/pragma.html#pragma_secure_delete

Dan.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett-2
Dan,

Thanks for the reply.

pragma secure_delete;
1
sqlite>

Which is a bit of a surprise as we have never seen it before. We had to
look it up using your reference.

No idea why that is set (if 1 means it is).

Should we simply ctrl-C the deletion, turn secure_delete off and then
try again? My working assumption is that since there is a -wal file we
are safe to do this.

Rob

On 16 Jul 2018, at 21:07, Dan Kennedy wrote:

> On 07/17/2018 03:01 AM, [hidden email] wrote:
>> Hi,
>>
>> We're doing a massive tidy on our database which is approx 50GB.
>>
>> One table is approx 49GB of that 50GB which we need to delete as we
>> have
>> recorded the information in a far better format thats approx 99% more
>> efficient. If only we had been this clever when we started....
>>
>> We've just 'dropped' the table and were assuming that dropping the
>> table
>> would be quite quick. It's not. So far we've been waiting for 30 mins
>> and nothing has come back yet. We can see that the -wal file is upto
>> 2.5GB. We have this terrible feeling that it'll need to get to 49GB
>> or
>> so before the table gets dropped. We can just about handle that in
>> the
>> current filesystem.
>>
>> We're now getting nervous about dropping this table. We had assumed
>> that
>> it would be a really quick and easy operation based on absolutely no
>> checking whatsoever. When we looked on line all we could see was a
>> reference to a very, very old and outdated page
>> (https://sqlite.org/speed.html) which talks about speed and at the
>> bottom of that page the comments
>>
>> "SQLite is slower than the other databases when it comes to dropping
>> tables. This probably is because when SQLite drops a table, it has to
>> go
>> through and erase the records in the database file that deal with
>> that
>> table. MySQL and PostgreSQL, on the other hand, use separate files to
>> represent each table so they can drop a table simply by deleting a
>> file,
>> which is much faster.
>>
>> On the other hand, dropping tables is not a very common operation so
>> if
>> SQLite takes a little longer, that is not seen as a big problem."
>>
>> Is this still the case, is it going to take a long time? If we assume
>> that the table is 49GB then will we need to wait until the -wal file
>> is
>> at 49GB. By our estimates thats approximately 10 hours away.
>>
>> Any help or idea or suggestions welcomed, but please be quick.
>
> Is secure-delete turned on?
>
>   https://www.sqlite.org/pragma.html#pragma_secure_delete
>
> Dan.
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
In reply to this post by Dan Kennedy-4
Dan,

Thanks for the reply.

pragma secure_delete;
1
sqlite>

Which is a bit of a surprise as we have never seen it before. We had to
look it up using your reference.

No idea why that is set (if 1 means it is).

Should we simply ctrl-C the deletion, turn secure_delete off and then
try again? My working assumption is that since there is a -wal file we
are safe to do this.

Rob

On 16 Jul 2018, at 21:07, Dan Kennedy wrote:

> On 07/17/2018 03:01 AM, [hidden email] wrote:
>> Hi,
>>
>> We're doing a massive tidy on our database which is approx 50GB.
>>
>> One table is approx 49GB of that 50GB which we need to delete as we
>> have
>> recorded the information in a far better format thats approx 99% more
>> efficient. If only we had been this clever when we started....
>>
>> We've just 'dropped' the table and were assuming that dropping the
>> table
>> would be quite quick. It's not. So far we've been waiting for 30 mins
>> and nothing has come back yet. We can see that the -wal file is upto
>> 2.5GB. We have this terrible feeling that it'll need to get to 49GB
>> or
>> so before the table gets dropped. We can just about handle that in
>> the
>> current filesystem.
>>
>> We're now getting nervous about dropping this table. We had assumed
>> that
>> it would be a really quick and easy operation based on absolutely no
>> checking whatsoever. When we looked on line all we could see was a
>> reference to a very, very old and outdated page
>> (https://sqlite.org/speed.html) which talks about speed and at the
>> bottom of that page the comments
>>
>> "SQLite is slower than the other databases when it comes to dropping
>> tables. This probably is because when SQLite drops a table, it has to
>> go
>> through and erase the records in the database file that deal with
>> that
>> table. MySQL and PostgreSQL, on the other hand, use separate files to
>> represent each table so they can drop a table simply by deleting a
>> file,
>> which is much faster.
>>
>> On the other hand, dropping tables is not a very common operation so
>> if
>> SQLite takes a little longer, that is not seen as a big problem."
>>
>> Is this still the case, is it going to take a long time? If we assume
>> that the table is 49GB then will we need to wait until the -wal file
>> is
>> at 49GB. By our estimates thats approximately 10 hours away.
>>
>> Any help or idea or suggestions welcomed, but please be quick.
>
> Is secure-delete turned on?
>
>   https://www.sqlite.org/pragma.html#pragma_secure_delete
>
> Dan.
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Dan Kennedy-4
On 07/17/2018 03:12 AM, Rob Willett wrote:

> Dan,
>
> Thanks for the reply.
>
> pragma secure_delete;
> 1
> sqlite>
>
> Which is a bit of a surprise as we have never seen it before. We had to
> look it up using your reference.
>
> No idea why that is set (if 1 means it is).
>
> Should we simply ctrl-C the deletion, turn secure_delete off and then
> try again? My working assumption is that since there is a -wal file we
> are safe to do this.

That sounds fine. Without secure-delete, the wal file should be pretty
small.

You can always safely kill a process in the middle of an SQLite
transaction. So long as you don't do anything foolhardy like deleting
wal or journal files afterwards.

Dan.






>
> Rob
>
> On 16 Jul 2018, at 21:07, Dan Kennedy wrote:
>
>> On 07/17/2018 03:01 AM, [hidden email] wrote:
>>> Hi,
>>>
>>> We're doing a massive tidy on our database which is approx 50GB.
>>>
>>> One table is approx 49GB of that 50GB which we need to delete as we have
>>> recorded the information in a far better format thats approx 99% more
>>> efficient. If only we had been this clever when we started....
>>>
>>> We've just 'dropped' the table and were assuming that dropping the table
>>> would be quite quick. It's not. So far we've been waiting for 30 mins
>>> and nothing has come back yet. We can see that the -wal file is upto
>>> 2.5GB. We have this terrible feeling that it'll need to get to 49GB or
>>> so before the table gets dropped. We can just about handle that in the
>>> current filesystem.
>>>
>>> We're now getting nervous about dropping this table. We had assumed that
>>> it would be a really quick and easy operation based on absolutely no
>>> checking whatsoever. When we looked on line all we could see was a
>>> reference to a very, very old and outdated page
>>> (https://sqlite.org/speed.html) which talks about speed and at the
>>> bottom of that page the comments
>>>
>>> "SQLite is slower than the other databases when it comes to dropping
>>> tables. This probably is because when SQLite drops a table, it has to go
>>> through and erase the records in the database file that deal with that
>>> table. MySQL and PostgreSQL, on the other hand, use separate files to
>>> represent each table so they can drop a table simply by deleting a file,
>>> which is much faster.
>>>
>>> On the other hand, dropping tables is not a very common operation so if
>>> SQLite takes a little longer, that is not seen as a big problem."
>>>
>>> Is this still the case, is it going to take a long time? If we assume
>>> that the table is 49GB then will we need to wait until the -wal file is
>>> at 49GB. By our estimates thats approximately 10 hours away.
>>>
>>> Any help or idea or suggestions welcomed, but please be quick.
>>
>> Is secure-delete turned on?
>>
>>   https://www.sqlite.org/pragma.html#pragma_secure_delete
>>
>> Dan.
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
Dan,

We've killed the process (kill -9). Fired up sqlite3 again, closed it
down normally. The -wal files were removed.

Fired up sqlite3 again, turned off secure_delete, started to drop the
table again, reniced it down (long story to do with IOPS and our VPS
provider)

-wal file is empty, the -shm file is a steady 32768 bytes.

No idea if anything is actually happening now, but at least we don't
have a massive -wal file.

Suspect this could take the night (its 21:24 in London), so we'll detach
the screen session and come back later.

Many thanks for the help.

Rob

On 16 Jul 2018, at 21:17, Dan Kennedy wrote:

> On 07/17/2018 03:12 AM, Rob Willett wrote:
>> Dan,
>>
>> Thanks for the reply.
>>
>> pragma secure_delete;
>> 1
>> sqlite>
>>
>> Which is a bit of a surprise as we have never seen it before. We had
>> to
>> look it up using your reference.
>>
>> No idea why that is set (if 1 means it is).
>>
>> Should we simply ctrl-C the deletion, turn secure_delete off and then
>> try again? My working assumption is that since there is a -wal file
>> we
>> are safe to do this.
>
> That sounds fine. Without secure-delete, the wal file should be pretty
> small.
>
> You can always safely kill a process in the middle of an SQLite
> transaction. So long as you don't do anything foolhardy like deleting
> wal or journal files afterwards.
>
> Dan.
>
>
>
>
>
>
>>
>> Rob
>>
>> On 16 Jul 2018, at 21:07, Dan Kennedy wrote:
>>
>>> On 07/17/2018 03:01 AM, [hidden email] wrote:
>>>> Hi,
>>>>
>>>> We're doing a massive tidy on our database which is approx 50GB.
>>>>
>>>> One table is approx 49GB of that 50GB which we need to delete as we
>>>> have
>>>> recorded the information in a far better format thats approx 99%
>>>> more
>>>> efficient. If only we had been this clever when we started....
>>>>
>>>> We've just 'dropped' the table and were assuming that dropping the
>>>> table
>>>> would be quite quick. It's not. So far we've been waiting for 30
>>>> mins
>>>> and nothing has come back yet. We can see that the -wal file is
>>>> upto
>>>> 2.5GB. We have this terrible feeling that it'll need to get to 49GB
>>>> or
>>>> so before the table gets dropped. We can just about handle that in
>>>> the
>>>> current filesystem.
>>>>
>>>> We're now getting nervous about dropping this table. We had assumed
>>>> that
>>>> it would be a really quick and easy operation based on absolutely
>>>> no
>>>> checking whatsoever. When we looked on line all we could see was a
>>>> reference to a very, very old and outdated page
>>>> (https://sqlite.org/speed.html) which talks about speed and at the
>>>> bottom of that page the comments
>>>>
>>>> "SQLite is slower than the other databases when it comes to
>>>> dropping
>>>> tables. This probably is because when SQLite drops a table, it has
>>>> to go
>>>> through and erase the records in the database file that deal with
>>>> that
>>>> table. MySQL and PostgreSQL, on the other hand, use separate files
>>>> to
>>>> represent each table so they can drop a table simply by deleting a
>>>> file,
>>>> which is much faster.
>>>>
>>>> On the other hand, dropping tables is not a very common operation
>>>> so if
>>>> SQLite takes a little longer, that is not seen as a big problem."
>>>>
>>>> Is this still the case, is it going to take a long time? If we
>>>> assume
>>>> that the table is 49GB then will we need to wait until the -wal
>>>> file is
>>>> at 49GB. By our estimates thats approximately 10 hours away.
>>>>
>>>> Any help or idea or suggestions welcomed, but please be quick.
>>>
>>> Is secure-delete turned on?
>>>
>>>   https://www.sqlite.org/pragma.html#pragma_secure_delete
>>>
>>> Dan.
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
Mmm.... It's still taking an awful long time, though the -wal file is
very small.

It does look as if one of sqlite's weaknesses is dropping very, very
large tables.

Oh well, lets let it run through the night.

Rob

On 16 Jul 2018, at 21:25, Rob Willett wrote:

> Dan,
>
> We've killed the process (kill -9). Fired up sqlite3 again, closed it
> down normally. The -wal files were removed.
>
> Fired up sqlite3 again, turned off secure_delete, started to drop the
> table again, reniced it down (long story to do with IOPS and our VPS
> provider)
>
> -wal file is empty, the -shm file is a steady 32768 bytes.
>
> No idea if anything is actually happening now, but at least we don't
> have a massive -wal file.
>
> Suspect this could take the night (its 21:24 in London), so we'll
> detach the screen session and come back later.
>
> Many thanks for the help.
>
> Rob
>
> On 16 Jul 2018, at 21:17, Dan Kennedy wrote:
>
>> On 07/17/2018 03:12 AM, Rob Willett wrote:
>>> Dan,
>>>
>>> Thanks for the reply.
>>>
>>> pragma secure_delete;
>>> 1
>>> sqlite>
>>>
>>> Which is a bit of a surprise as we have never seen it before. We had
>>> to
>>> look it up using your reference.
>>>
>>> No idea why that is set (if 1 means it is).
>>>
>>> Should we simply ctrl-C the deletion, turn secure_delete off and
>>> then
>>> try again? My working assumption is that since there is a -wal file
>>> we
>>> are safe to do this.
>>
>> That sounds fine. Without secure-delete, the wal file should be
>> pretty small.
>>
>> You can always safely kill a process in the middle of an SQLite
>> transaction. So long as you don't do anything foolhardy like deleting
>> wal or journal files afterwards.
>>
>> Dan.
>>
>>
>>
>>
>>
>>
>>>
>>> Rob
>>>
>>> On 16 Jul 2018, at 21:07, Dan Kennedy wrote:
>>>
>>>> On 07/17/2018 03:01 AM, [hidden email] wrote:
>>>>> Hi,
>>>>>
>>>>> We're doing a massive tidy on our database which is approx 50GB.
>>>>>
>>>>> One table is approx 49GB of that 50GB which we need to delete as
>>>>> we have
>>>>> recorded the information in a far better format thats approx 99%
>>>>> more
>>>>> efficient. If only we had been this clever when we started....
>>>>>
>>>>> We've just 'dropped' the table and were assuming that dropping the
>>>>> table
>>>>> would be quite quick. It's not. So far we've been waiting for 30
>>>>> mins
>>>>> and nothing has come back yet. We can see that the -wal file is
>>>>> upto
>>>>> 2.5GB. We have this terrible feeling that it'll need to get to
>>>>> 49GB or
>>>>> so before the table gets dropped. We can just about handle that in
>>>>> the
>>>>> current filesystem.
>>>>>
>>>>> We're now getting nervous about dropping this table. We had
>>>>> assumed that
>>>>> it would be a really quick and easy operation based on absolutely
>>>>> no
>>>>> checking whatsoever. When we looked on line all we could see was a
>>>>> reference to a very, very old and outdated page
>>>>> (https://sqlite.org/speed.html) which talks about speed and at the
>>>>> bottom of that page the comments
>>>>>
>>>>> "SQLite is slower than the other databases when it comes to
>>>>> dropping
>>>>> tables. This probably is because when SQLite drops a table, it has
>>>>> to go
>>>>> through and erase the records in the database file that deal with
>>>>> that
>>>>> table. MySQL and PostgreSQL, on the other hand, use separate files
>>>>> to
>>>>> represent each table so they can drop a table simply by deleting a
>>>>> file,
>>>>> which is much faster.
>>>>>
>>>>> On the other hand, dropping tables is not a very common operation
>>>>> so if
>>>>> SQLite takes a little longer, that is not seen as a big problem."
>>>>>
>>>>> Is this still the case, is it going to take a long time? If we
>>>>> assume
>>>>> that the table is 49GB then will we need to wait until the -wal
>>>>> file is
>>>>> at 49GB. By our estimates thats approximately 10 hours away.
>>>>>
>>>>> Any help or idea or suggestions welcomed, but please be quick.
>>>>
>>>> Is secure-delete turned on?
>>>>
>>>>   https://www.sqlite.org/pragma.html#pragma_secure_delete
>>>>
>>>> Dan.
>>>>
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [hidden email]
>>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Richard Hipp-3
On 7/16/18, Rob Willett <[hidden email]> wrote:
>
> It does look as if one of sqlite's weaknesses is dropping very, very
> large tables.
>

Right.  If every table were stored in a separate file, a DROP TABLE
could be translated to a relatively fast unlink().  But then a
database would be a directory full of files, rather than a single
file, which would undermine a lot of the usefulness of SQLite.
Furthermore, DROP TABLE is an uncommon operation.  We prefer to
provide a database where all content is contained in a single file and
that is optimized SELECTs and for INSERTs and DELETEs of a subset of
the rows in the table, as that seems to be far more useful in most
cases.  There are always engineering trade-offs.  SQLite provides
single-file databases and fast queries in exchange for slower DROP
TABLEs.

A filesystem is able to implement unlink() quickly because it has the
entire partition available for laying out the locations of files.
Space can be allocated to a file in large chunks, which makes
deallocation faster.  In other words, in a filesystem, the files can
be positioned sparsely on disk, with lots of unused space in between
the various file to accommodate growth . But in SQLite, the goal is to
keep the database file size as small as possible, and to not have
unnecessary unused pages in the middle of the database.  Hence, space
for tables much be allocated in relatively small 1-page chunks, which
means that there will be a large number of chunks to deallocate when
dropping a large table.  If you were to construct a filesystem that
tried to keep all file content tightly packed at the beginning of a
partition (say, for example, to make shrinking of a partition faster)
then unlink() would necessarily be slower on that filesystem.  That
seems like a bad engineering trade-off for a filesystem, but it is
(for most applications) a good trade-off for a database such as
SQLite.

You can work around this.  If you have one or more tables that you
think might need to be DROP-ed frequently, then consider storing them
in a separate database files and ATTACH-ing those separate database
files to your main database.  Then, to drop those tables, you can
DETACH them, and then unlink() the corresponding database file.  That
should go much faster.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Jay Kreibich
In reply to this post by Rob Willett

> On Jul 16, 2018, at 3:01 PM, [hidden email] wrote:
>
> Hi,
>
> We're doing a massive tidy on our database which is approx 50GB.
>
> One table is approx 49GB of that 50GB which we need to delete as we have recorded the information in a far better format thats approx 99% more efficient. If only we had been this clever when we started….


Depending on the complexity of the scheme, it sounds like it might be easier to just dump the other tables, re-create the database without this jumbo table, and re-import the other data.

 -j


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
Jay,

I think your approach would have been quicker. Ten hours so far and it's
still deleting the table :(

Rob

On 17 Jul 2018, at 2:16, Jay Kreibich wrote:

>> On Jul 16, 2018, at 3:01 PM, [hidden email] wrote:
>>
>> Hi,
>>
>> We're doing a massive tidy on our database which is approx 50GB.
>>
>> One table is approx 49GB of that 50GB which we need to delete as we
>> have recorded the information in a far better format thats approx 99%
>> more efficient. If only we had been this clever when we started….
>
>
> Depending on the complexity of the scheme, it sounds like it might be
> easier to just dump the other tables, re-create the database without
> this jumbo table, and re-import the other data.
>
>  -j
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
In reply to this post by Richard Hipp-3
Richard,

Thanks for the comprehensive update.

We appreciate that there are tradeoffs and that dropping a table is a
rarer operation than most others. The upside of the Sqlite design is
that we treat a database as a single file which we copy around with
ease. You cannot do that with many other databases, this has a downside
though of when you need to drop a table.

A less charitable person would have stated that it was our own fault for
designing the database and using it with approx 190M records in a single
table with little thought about how we would manage it.  We now know
this was a dumb idea, however you live and learn.

The impact of this is simply time. We've worked out how to shrink the
database from 50GB to approx 1GB, after this we can prune the database
on a weekly basis so this never becomes an issue.

I supposed the key question is would we give up the portability of
sqlite for a fast table drop? The answer is No.

Rob

On 16 Jul 2018, at 22:59, Richard Hipp wrote:

> On 7/16/18, Rob Willett <[hidden email]> wrote:
>>
>> It does look as if one of sqlite's weaknesses is dropping very, very
>> large tables.
>>
>
> Right.  If every table were stored in a separate file, a DROP TABLE
> could be translated to a relatively fast unlink().  But then a
> database would be a directory full of files, rather than a single
> file, which would undermine a lot of the usefulness of SQLite.
> Furthermore, DROP TABLE is an uncommon operation.  We prefer to
> provide a database where all content is contained in a single file and
> that is optimized SELECTs and for INSERTs and DELETEs of a subset of
> the rows in the table, as that seems to be far more useful in most
> cases.  There are always engineering trade-offs.  SQLite provides
> single-file databases and fast queries in exchange for slower DROP
> TABLEs.
>
> A filesystem is able to implement unlink() quickly because it has the
> entire partition available for laying out the locations of files.
> Space can be allocated to a file in large chunks, which makes
> deallocation faster.  In other words, in a filesystem, the files can
> be positioned sparsely on disk, with lots of unused space in between
> the various file to accommodate growth . But in SQLite, the goal is to
> keep the database file size as small as possible, and to not have
> unnecessary unused pages in the middle of the database.  Hence, space
> for tables much be allocated in relatively small 1-page chunks, which
> means that there will be a large number of chunks to deallocate when
> dropping a large table.  If you were to construct a filesystem that
> tried to keep all file content tightly packed at the beginning of a
> partition (say, for example, to make shrinking of a partition faster)
> then unlink() would necessarily be slower on that filesystem.  That
> seems like a bad engineering trade-off for a filesystem, but it is
> (for most applications) a good trade-off for a database such as
> SQLite.
>
> You can work around this.  If you have one or more tables that you
> think might need to be DROP-ed frequently, then consider storing them
> in a separate database files and ATTACH-ing those separate database
> files to your main database.  Then, to drop those tables, you can
> DETACH them, and then unlink() the corresponding database file.  That
> should go much faster.
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
The top table finally finished around 07:00 this morning. It took 9.5
hours to drop the circa 190M rows.

I suspect that part of the issue is the VPS provider we use has a rate
limiter on IOPS which is not normally an issue for us, but that might
have slowed it down somewhat. However I don't think that it would have
slowed it down by hours.

Lessons to be learnt here:

1. Design your database correctly at the start. This was our biggest
problem, we failed to understand how large a single table would grow and
didn't manage the administration correctly and in good time.
2. Think carefully about the data you actually need. We managed to throw
away 99% of our data and combined it together to get a 50GB database
down to 500MB.
3. Dropping a table may not be the fastest method, creating all the
other smaller tables in a separate database would have been far quicker.
There's always more than one way to solve a problem.
4. Turn off secure_delete if it's on.
5. Use the mailing group for advice. Its brilliant!
6. Assess, plan, act. Thats a diving phrase but very pertinent here,
assess what the problem actually is before you plan and execute.

Now we have other checks to do to assure us the database is accurate for
our needs.

Thanks to all the people who offered advice and help

Rob

On 17 Jul 2018, at 7:02, Rob Willett wrote:

> Richard,
>
> Thanks for the comprehensive update.
>
> We appreciate that there are tradeoffs and that dropping a table is a
> rarer operation than most others. The upside of the Sqlite design is
> that we treat a database as a single file which we copy around with
> ease. You cannot do that with many other databases, this has a
> downside though of when you need to drop a table.
>
> A less charitable person would have stated that it was our own fault
> for designing the database and using it with approx 190M records in a
> single table with little thought about how we would manage it.  We now
> know this was a dumb idea, however you live and learn.
>
> The impact of this is simply time. We've worked out how to shrink the
> database from 50GB to approx 1GB, after this we can prune the database
> on a weekly basis so this never becomes an issue.
>
> I supposed the key question is would we give up the portability of
> sqlite for a fast table drop? The answer is No.
>
> Rob
>
> On 16 Jul 2018, at 22:59, Richard Hipp wrote:
>
>> On 7/16/18, Rob Willett <[hidden email]> wrote:
>>>
>>> It does look as if one of sqlite's weaknesses is dropping very, very
>>> large tables.
>>>
>>
>> Right.  If every table were stored in a separate file, a DROP TABLE
>> could be translated to a relatively fast unlink().  But then a
>> database would be a directory full of files, rather than a single
>> file, which would undermine a lot of the usefulness of SQLite.
>> Furthermore, DROP TABLE is an uncommon operation.  We prefer to
>> provide a database where all content is contained in a single file
>> and
>> that is optimized SELECTs and for INSERTs and DELETEs of a subset of
>> the rows in the table, as that seems to be far more useful in most
>> cases.  There are always engineering trade-offs.  SQLite provides
>> single-file databases and fast queries in exchange for slower DROP
>> TABLEs.
>>
>> A filesystem is able to implement unlink() quickly because it has the
>> entire partition available for laying out the locations of files.
>> Space can be allocated to a file in large chunks, which makes
>> deallocation faster.  In other words, in a filesystem, the files can
>> be positioned sparsely on disk, with lots of unused space in between
>> the various file to accommodate growth . But in SQLite, the goal is
>> to
>> keep the database file size as small as possible, and to not have
>> unnecessary unused pages in the middle of the database.  Hence, space
>> for tables much be allocated in relatively small 1-page chunks, which
>> means that there will be a large number of chunks to deallocate when
>> dropping a large table.  If you were to construct a filesystem that
>> tried to keep all file content tightly packed at the beginning of a
>> partition (say, for example, to make shrinking of a partition faster)
>> then unlink() would necessarily be slower on that filesystem.  That
>> seems like a bad engineering trade-off for a filesystem, but it is
>> (for most applications) a good trade-off for a database such as
>> SQLite.
>>
>> You can work around this.  If you have one or more tables that you
>> think might need to be DROP-ed frequently, then consider storing them
>> in a separate database files and ATTACH-ing those separate database
>> files to your main database.  Then, to drop those tables, you can
>> DETACH them, and then unlink() the corresponding database file.  That
>> should go much faster.
>>
>> --
>> D. Richard Hipp
>> [hidden email]
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Simon Slavin-3
On 17 Jul 2018, at 8:37am, Rob Willett <[hidden email]> wrote:

> I suspect that part of the issue is the VPS provider we use has a rate limiter on IOPS which is not normally an issue for us, but that might have slowed it down somewhat. However I don't think that it would have slowed it down by hours.

Actually I think VPS had a lot to do with the time the operation took.  Any kind of virtual machine takes a terrible hit during the sort of storage access involved in dropping the table.

SQLite spent that whole time accessing your 50GB database file in an apparently random order.   So you had nine hours of cache misses, causing the virtual machine to continually write virtual pages back to real storage and read other pages into memory.  Virtual systems are optimized for cache hits, not cache misses.

I can't prove it without a lot of pointless data manipulation on your type of VPS, but I think you found its least optimal operation.  The good part is that now your database is less than 1GB long you're going to see a massive increase in speed since the whole database may well fit in the cache of your virtual machine.

Must remember in future, when people report unusually slow operations, to ask whether they're using a virtual machine or real hardware.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
Simon,

Thanks for this. You make some interesting points about cache hits and
misses. Up until now, we hadn't seen a massive performance hit, whilst
some database operations didn't work as fast as we would like them to,
nothing works as fast we would like.

I'm not familiar with these issues with virtualisation. The VPI we use
has OpenVZ at it's core (no pun intended). We can see a little on the
internet about this (well one article specifically about MySQL and
OpenVZ) but nothing else.

We are in the process of evaluating whether to move to a different VPS,
some use OpenVZ, some use KVM, internally we use VMWare ESXI. Very
little real DB work is done on the actual metal, its all virtualised.

Do you have any pointers to stuff we can read up on? We don't understand
your comment "SQLite spent that whole time accessing your 50GB database
file in an apparently random order." and would like to try and get more
information about it.

We have the option of moving off OpenVZ to KVM or ESXI so if we can
understand the issue, we can make a more informed choice. Whilst our DB
has dropped down to 500MB we still need to do a fair amount of testing
and checking to make sure there are no unusual edge cases (or bugs)
based before we promote it to live.

Many thanks

Rob

On 17 Jul 2018, at 12:05, Simon Slavin wrote:

> On 17 Jul 2018, at 8:37am, Rob Willett <[hidden email]>
> wrote:
>
>> I suspect that part of the issue is the VPS provider we use has a
>> rate limiter on IOPS which is not normally an issue for us, but that
>> might have slowed it down somewhat. However I don't think that it
>> would have slowed it down by hours.
>
> Actually I think VPS had a lot to do with the time the operation took.
>  Any kind of virtual machine takes a terrible hit during the sort of
> storage access involved in dropping the table.
>
> SQLite spent that whole time accessing your 50GB database file in an
> apparently random order.   So you had nine hours of cache misses,
> causing the virtual machine to continually write virtual pages back to
> real storage and read other pages into memory.  Virtual systems are
> optimized for cache hits, not cache misses.
>
> I can't prove it without a lot of pointless data manipulation on your
> type of VPS, but I think you found its least optimal operation.  The
> good part is that now your database is less than 1GB long you're going
> to see a massive increase in speed since the whole database may well
> fit in the cache of your virtual machine.
>
> Must remember in future, when people report unusually slow operations,
> to ask whether they're using a virtual machine or real hardware.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Simon Slavin-3
On 17 Jul 2018, at 12:20pm, Rob Willett <[hidden email]> wrote:

> Do you have any pointers to stuff we can read up on? We don't understand your comment "SQLite spent that whole time accessing your 50GB database file in an apparently random order." and would like to try and get more information about it.

I don't know much about specific virtualisation packages and have no recommendation.  I was just comparing virtual machines in general with operations on "actual metal" as you put it.

SQLite database files are split up into pages.  Each page contains data for exactly one table or index.  So even just a table with a primary index requires two different kinds of pages, one for the row of data, one for the corresponding entry in the primary key index.  And making changes to them both requires changes at other locations: the places where the page indexes are stored.

Worse still, unless all your indexes are monatonic (with values in those indexes only ever increasing) anything progressing through them in order would not be reading through the database file in order.  This means you get continuous cache misses: the next piece of data you need is rarely already in the virtual computer's cache, because it's usually in a part of the file far away from the one you just read.

On a real computer, storage is only one motherboard connection and storage driver away from the CPU.  On a virtual computer the virtual storage is on a file server far away from the processing, and each time data isn't already in the processor's cache something has to work out where, in the racks of file servers, that page of file is stored, talk to that file server, and route traffic back and forth to the virtual processor.  So file operations are more slow and expensive than they are for real computers.  And an operation which generates cache misses for 50Gig of data ends up doing more work done than it would for a simple "real metal" setup.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

R Smith-2
In reply to this post by Rob Willett
On 2018/07/17 1:20 PM, Rob Willett wrote:
>
> Thanks for this. You make some interesting points about cache hits and
> misses. Up until now, we hadn't seen a massive performance hit, whilst
> some database operations didn't work as fast as we would like them to,
> nothing works as fast we would like.

This is actually a common thread seen in this industry. It's quite
normal for engineers or devs to put a system (of which they typically
have only one at the start) and then simply assume that whatever speed
it is going at is the correct speed, and often only bother to check
actual speed deficiencies once they hit some real time barrier (which
typically only happens much later) - and this is mostly just fine.

I propose as a test in future, to make a great big DB in whatever format
you intend to use, then run it on some fast hardware, and then the
typical VM, see the difference and decide if it's horrible or not and
need further investigation or not.
That said, even in your current problem, you probably wouldn't have
noticed any real performance hit right until you decided to drop that
49GB table. Even now I'm thinking, you've been using your system for
ages, it's been working great, one sunny afternoon you had to wait half
a day for one maintenance operation... is that really worth changing an
architecture for? Half-days are ten-a-penny.

>
> I'm not familiar with these issues with virtualisation. The VPI we use
> has OpenVZ at it's core (no pun intended). We can see a little on the
> internet about this (well one article specifically about MySQL and
> OpenVZ) but nothing else.
>
> We are in the process of evaluating whether to move to a different
> VPS, some use OpenVZ, some use KVM, internally we use VMWare ESXI.
> Very little real DB work is done on the actual metal, its all
> virtualised.

I am not familiar with usage figures on OpenVZ, but I don't think it
should be worse than any other virtualization. The problems described by
others will exist for all of them, and if it is particularly worse in
any area, Google should find a zillion results on the problem. The mere
scantness of results tells me it probably works just fine.


>
> Do you have any pointers to stuff we can read up on? We don't
> understand your comment "SQLite spent that whole time accessing your
> 50GB database file in an apparently random order." and would like to
> try and get more information about it.

Simon (and previously Richard) simply referred to the way in which an
SQLite table stores information is not similar to the way a file system
stores it, and as such may take hits (especially in the cache misses)
because the VM is optimized to handle file access in the way that normal
programs (the other 98% of stuff out there) read files - sequentially.
Doing random-access reads in a file happens, but is not common and so if
you make a VM and have to choose which model to cater for, SQLIte's
model never wins. It's usually not horrible either, but a 49GB sqlite
table drop will bring out the worst in every VM - that's the only point.

>
> We have the option of moving off OpenVZ to KVM or ESXI so if we can
> understand the issue, we can make a more informed choice. Whilst our
> DB has dropped down to 500MB we still need to do a fair amount of
> testing and checking to make sure there are no unusual edge cases (or
> bugs) based before we promote it to live.

Accurate testing will save you every time.
May I ask a curiosity... What kind of data did you store (maybe some
examples if possible) that you could condense it by ~99% like that? Did
you just throw away stuff? Were fields duplicate? Did you discover the
World's bestest new compression method? Did you simply elect to store
stuff that were unneeded or implied or somehow irrelevant and now simply
don't store it anymore? Do you possess Alien technology?



Cheers,
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Simon Slavin-3
On 17 Jul 2018, at 1:10pm, R Smith <[hidden email]> wrote:

> What kind of data did you store (maybe some examples if possible) that you could condense it by ~99% like that?

I think that the OP's organisation discovered the 'relational' part of RDBMS and implemented normalisation.

To Rob Willett: Ryan Smith's explanation is better than mine.  Please read his post and ignore my wooly one.  I tried three ways to get across the sequential-access vs. random-access point and wasn't really happy with anything I wrote.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett-2
Simon,

No, we knew about normalisation, the database is normalised, that part
of the design hasn't changed in years.

The reasons for the massive reduction in database size is that we worked
out how to handle repeating lines of data that change just enough that
we thought they needed to be kept. With some small tweaks we could
eliminate data that we thought we needed to preserve. We had assumed
that we needed to keep everything, but by going back in, looking at what
actual data we now had, we worked out we could store a small amount of
extra information that stores some state change but this meant we could
dump most of the database.

It was something that we hadn't appreciated a few years ago, and the
ever increasing database meant we had to do something about the DB size.

Its taken a few months of work, mainly in a dark room with a wet towel
on our foreheads, to do the tiny changes necessary to make a big
difference.

In hindsight the fact that SQLite is so easy and fast to use was a
slight disadvantage to us, it allowed us to be a bit quick and dirty
with designs, when we should have thought through some of the issues.
However thats what startups are like ;)

I think all the explanations to date have been helpful and appreciate
the time take to answer, we're going to think a little more carefully
about how we manage our database on a VM. I'm in the process of moving
home so the only real hardware (of any note) is sitting in storage so
the only testing we can do is either virtualised or on Mac laptops.
Neither of which will help us in this instance.

Rob

On 17 Jul 2018, at 13:16, Simon Slavin wrote:

> On 17 Jul 2018, at 1:10pm, R Smith <[hidden email]> wrote:
>
>> What kind of data did you store (maybe some examples if possible)
>> that you could condense it by ~99% like that?
>
> I think that the OP's organisation discovered the 'relational' part of
> RDBMS and implemented normalisation.
>
> To Rob Willett: Ryan Smith's explanation is better than mine.  Please
> read his post and ignore my wooly one.  I tried three ways to get
> across the sequential-access vs. random-access point and wasn't really
> happy with anything I wrote.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Any ideas on how long it takes to drop a large table...rather an important question now...

Rob Willett
In reply to this post by Simon Slavin-3
Simon,

No, we knew about normalisation, the database is normalised, that part
of the design hasn't changed in years.

The reasons for the massive reduction in database size is that we worked
out how to handle repeating lines of data that change just enough that
we thought they needed to be kept. With some small tweaks we could
eliminate data that we thought we needed to preserve. We had assumed
that we needed to keep everything, but by going back in, looking at what
actual data we now had, we worked out we could store a small amount of
extra information that stores some state change but this meant we could
dump most of the database.

It was something that we hadn't appreciated a few years ago, and the
ever increasing database meant we had to do something about the DB size.

Its taken a few months of work, mainly in a dark room with a wet towel
on our foreheads, to do the tiny changes necessary to make a big
difference.

In hindsight the fact that SQLite is so easy and fast to use was a
slight disadvantage to us, it allowed us to be a bit quick and dirty
with designs, when we should have thought through some of the issues.
However thats what startups are like ;)

I think all the explanations to date have been helpful and appreciate
the time take to answer, we're going to think a little more carefully
about how we manage our database on a VM. I'm in the process of moving
home so the only real hardware (of any note) is sitting in storage so
the only testing we can do is either virtualised or on Mac laptops.
Neither of which will help us in this instance.

Rob

On 17 Jul 2018, at 13:16, Simon Slavin wrote:

> On 17 Jul 2018, at 1:10pm, R Smith <[hidden email]> wrote:
>
>> What kind of data did you store (maybe some examples if possible)
>> that you could condense it by ~99% like that?
>
> I think that the OP's organisation discovered the 'relational' part of
> RDBMS and implemented normalisation.
>
> To Rob Willett: Ryan Smith's explanation is better than mine.  Please
> read his post and ignore my wooly one.  I tried three ways to get
> across the sequential-access vs. random-access point and wasn't really
> happy with anything I wrote.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users