Expedited vacuum tool?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Expedited vacuum tool?

David Raymond
After reading the below comment in vacuum.c I'm wondering if anyone has written a simple tool to do the faster vacuum mentioned in the comment? Given of course that no one's going to be accessing the file while it's running. A fair chunk of my stuff tends to be loading a bunch of things into a brand new database, doing analyze and vacuum, then making the file read only and releasing it to be queried for reference. So a faster vacuum would be nice. It would also eliminate my need for the "depreciated" temp_store_directory pragma when vacuuming databases that are larger than the free space on my C drive where the default temp folder is.

"Only 1x temporary space and only 1x writes would be required if
the copy of step (3) were replaced by deleting the original database
and renaming the transient database as the original.  But that will
not work if other processes are attached to the original database.
And a power loss in between deleting the original and renaming the
transient would cause the database file to appear to be deleted
following reboot."

In the same train of thought: What's the difference between .backup and .clone in the CLI, and would either of those do the equivalent?
.backup ?DB? FILE      Backup DB (default "main") to FILE
.clone NEWDB           Clone data into NEWDB from the existing database
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Expedited vacuum tool?

Keith Medcalf
On Thursday, 15 June, 2017 09:40, David Raymond <[hidden email]> wrote:

> In the same train of thought: What's the difference between .backup and
> .clone in the CLI, and would either of those do the equivalent?
> .backup ?DB? FILE      Backup DB (default "main") to FILE
> .clone NEWDB           Clone data into NEWDB from the existing database

.backup makes a backup of the database file using the backup API call, effectively copying all the pages (without looking at them to much) from main to the new file.

.clone "copies" the data to a new database file by effectively doing the following:

open the newfile
begin exclusive on newfile
for each sqlstatement in main.sqlite_master
   if type=='table'
      execute retrieved sql in newfile
      prepare select * from main.table
      prepare insert or replace into newfile.table
      step select from main.table
      while (SQLITE_ROW)
        bind values from selected row to prepared insert
        step insert statement
        step select statement
for each sqlstatement in main.sqlite_master
   if type != 'table'
      execute retrieved sql in newfile
commit newfile
close newfile

So, backup will be faster.  clone may permit data to be recovered from corrupted databases.  Sort of like how dump/load works, only without the overhead of writing a dump file and reloading it, or generating SQL statements.  In most cases similar to:

sqlite3 main.db .dump | sqlite3 newfile.db







_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Expedited vacuum tool?

R Smith
They both cause a new copy-of-the-source-db file, but...


On 2017/06/15 6:07 PM, Keith Medcalf wrote:
> On Thursday, 15 June, 2017 09:40, David Raymond <[hidden email]> wrote:
>
>> In the same train of thought: What's the difference between .backup and
>> .clone in the CLI, and would either of those do the equivalent?
>> .backup ?DB? FILE      Backup DB (default "main") to FILE
>> .clone NEWDB           Clone data into NEWDB from the existing database
> .backup makes a backup of the database file using the backup API call, effectively copying all the pages (without looking at them to much) from main to the new file.

Indeed - It's also worth noting that the backup is able to be used
within a live database situation. i.e. it doesn't hold an exclusive
lock. It will transfer pages to the backed-up DB file while you can
still use the DB as normal, and, if there are any writes to the source
DB while being backed up, the backup API will realise this and start
over. Once it completes, you have the most recent state of the DB.
The BACKUP is an API functionality available to any process linking
against the SQLite source code.

.clone is (AFAICT) a construct of the CLI and not an API interface, it
simply transfers data via a custom CLI process using SQL API functions
as Keith explained below.

Basically, these are two very different means to a similar end and many
situations can be fathomed where one will be superior to the other and
vice versa.

>
> .clone "copies" the data to a new database file by effectively doing the following:
>
> open the newfile
> begin exclusive on newfile
> for each sqlstatement in main.sqlite_master
>     if type=='table'
>        execute retrieved sql in newfile
>        prepare select * from main.table
>        prepare insert or replace into newfile.table
>        step select from main.table
>        while (SQLITE_ROW)
>          bind values from selected row to prepared insert
>          step insert statement
>          step select statement
> for each sqlstatement in main.sqlite_master
>     if type != 'table'
>        execute retrieved sql in newfile
> commit newfile
> close newfile
>
> So, backup will be faster.  clone may permit data to be recovered from corrupted databases.  Sort of like how dump/load works, only without the overhead of writing a dump file and reloading it, or generating SQL statements.  In most cases similar to:
>
> sqlite3 main.db .dump | sqlite3 newfile.db
>
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Expedited vacuum tool?

Simon Slavin-3
In reply to this post by Keith Medcalf


On 15 Jun 2017, at 5:07pm, Keith Medcalf <[hidden email]> wrote:
>
> .backup makes a backup of the database file using the backup API call, effectively copying all the pages (without looking at them to much) from main to the new file.

However, I think this lacks one of the advantages of VACUUM: it won’t reclaim unused space in the pages it copies.  The copy will be the same size as the original.

If you’re absolutely certain that nobody else is accessing your database, I wonder whether some of the PRAGMAs might speed up VACUUM.

Simon,
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Expedited vacuum tool?

Donald Griggs
In reply to this post by David Raymond
Regarding: depreciated *temp_store_directory* pragma:


It seems that setting your TEMP environment variable to the location of
your temporary
storage area should let you dispense with the deprecated pragma.

http://sqlite.1065341.n5.nabble.com/pragma-temp-store-directory-is-deprecated-what-is-the-alternative-td83280.html


On Thu, Jun 15, 2017 at 11:39 AM, David Raymond <[hidden email]>
wrote:

> After reading the below comment in vacuum.c I'm wondering if anyone has
> written a simple tool to do the faster vacuum mentioned in the comment?
> Given of course that no one's going to be accessing the file while it's
> running. A fair chunk of my stuff tends to be loading a bunch of things
> into a brand new database, doing analyze and vacuum, then making the file
> read only and releasing it to be queried for reference. So a faster vacuum
> would be nice. It would also eliminate my need for the "depreciated"
> temp_store_directory pragma when vacuuming databases that are larger than
> the free space on my C drive where the default temp folder is.
>
> "Only 1x temporary space and only 1x writes would be required if
> the copy of step (3) were replaced by deleting the original database
> and renaming the transient database as the original.  But that will
> not work if other processes are attached to the original database.
> And a power loss in between deleting the original and renaming the
> transient would cause the database file to appear to be deleted
> following reboot."
>
> In the same train of thought: What's the difference between .backup and
> .clone in the CLI, and would either of those do the equivalent?
> .backup ?DB? FILE      Backup DB (default "main") to FILE
> .clone NEWDB           Clone data into NEWDB from the existing database
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Expedited vacuum tool?

David Raymond
In reply to this post by Simon Slavin-3
I've got synchronous = off as the main one. Once in a while I'm tempted for journal_mode = off as well but that's just a little too risky for me, and making a copy to vacuum with it off would be about the same as having it on anyway. Apart from those nothing jumps out as a time saver, either pragmas or compile options.


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Thursday, June 15, 2017 1:46 PM
To: SQLite mailing list
Subject: Re: [sqlite] Expedited vacuum tool?



On 15 Jun 2017, at 5:07pm, Keith Medcalf <[hidden email]> wrote:
>
> .backup makes a backup of the database file using the backup API call, effectively copying all the pages (without looking at them to much) from main to the new file.

However, I think this lacks one of the advantages of VACUUM: it won’t reclaim unused space in the pages it copies.  The copy will be the same size as the original.

If you’re absolutely certain that nobody else is accessing your database, I wonder whether some of the PRAGMAs might speed up VACUUM.

Simon,
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...