Incremental backup/sync facility?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Incremental backup/sync facility?

Stephan Buchert-2

We are using Sqlite for data from satellite Earth observations. It
works very well. Thanks to everybody contributing to Sqlite, uppermost
Dr. Hipp.

The largest database file has now grown to about 180 GB. I need to have
copies of the files at at least two different places. The databases are
updated regularly as new data from the satellites become available.

Having the copies of the file synced becomes increasingly tedious
as their sizes increase. Ideal would be some kind of
incremental backup/sync facility.

I looked at https://www.sqlite.org/backup.html, but, as it does
a complete backup, this takes a long time for such large
databases (files are mounted over networks), and other work gets
blocked.

An alternative is perhaps https://www.sqlite.org/sqldiff.html, but it
is also slow for large databases. Most of the data are binary, and the
SQL input/output of sqldiff/sqlite3 is probably not very efficient.

A solution might be to add an extension to the standard VFS
http://www.sqlite.org/vfs.html. Each page would need a counter which
increments when the page is modified (and there seems to
be spare place for such counters). Then the backup could test which
pages need to be updated. Is there any chance that such an extension
could be added?

A kind of hack-ish solution might be to update the primary database
files in WAL mode, copy only the WAL file to the secondary place,
and force there WAL checkpoint. Would this work?

/Stephan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Hick Gunter
May I suggest using a statement log of the "primary" database to update the secondary sites? If you have a sequence counter (separate or embedded) then the state of the database could be recovered from a backup (sequence counter x) plus the log files for all statements from x+1 to "current".

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Stephan Buchert
Sent: Freitag, 06. Mai 2016 14:32
To: [hidden email]
Subject: [sqlite] Incremental backup/sync facility?


We are using Sqlite for data from satellite Earth observations. It works very well. Thanks to everybody contributing to Sqlite, uppermost Dr. Hipp.

The largest database file has now grown to about 180 GB. I need to have copies of the files at at least two different places. The databases are updated regularly as new data from the satellites become available.

Having the copies of the file synced becomes increasingly tedious as their sizes increase. Ideal would be some kind of incremental backup/sync facility.

I looked at https://www.sqlite.org/backup.html, but, as it does a complete backup, this takes a long time for such large databases (files are mounted over networks), and other work gets blocked.

An alternative is perhaps https://www.sqlite.org/sqldiff.html, but it is also slow for large databases. Most of the data are binary, and the SQL input/output of sqldiff/sqlite3 is probably not very efficient.

A solution might be to add an extension to the standard VFS http://www.sqlite.org/vfs.html. Each page would need a counter which increments when the page is modified (and there seems to be spare place for such counters). Then the backup could test which pages need to be updated. Is there any chance that such an extension could be added?

A kind of hack-ish solution might be to update the primary database files in WAL mode, copy only the WAL file to the secondary place, and force there WAL checkpoint. Would this work?

/Stephan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Simon Slavin-3
In reply to this post by Stephan Buchert-2

On 6 May 2016, at 1:32pm, Stephan Buchert <[hidden email]> wrote:

> The largest database file has now grown to about 180 GB. I need to have
> copies of the files at at least two different places. The databases are
> updated regularly as new data from the satellites become available.
>
> Having the copies of the file synced becomes increasingly tedious
> as their sizes increase. Ideal would be some kind of
> incremental backup/sync facility.

Believe it or not, the fastest way to synchronise the databases is not to synchronise the databases.  Instead you keep a log of the instructions used to modify the database.  You might, for example, modify the library that you use for INSERT, DELETE and UPDATE commands to execute those commands and also save the command to another 'commandLog' table.  Or perhaps just append those commands to a plain text file.

Then instead of sending any data to the other sites you send this list of commands to the other sites and have them execute them.

Once you start implementing this you'll see that it's more complicated than I have described but the text of your post suggests that you're a good enough programmer to do it properly.

This assumes that the structure and primary keys of the tables which hold data are constructed in such a way that the order in which new data is entered doesn't matter.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Richard Hipp-3
In reply to this post by Stephan Buchert-2
On 5/6/16, Stephan Buchert <[hidden email]> wrote:
>
> A kind of hack-ish solution might be to update the primary database
> files in WAL mode, copy only the WAL file to the secondary place,
> and force there WAL checkpoint. Would this work?
>

This sounds like the most promising solution to me.  We'll think on it some....
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Richard Hipp-3
In reply to this post by Simon Slavin-3
On 5/6/16, Simon Slavin <[hidden email]> wrote:
>
> Believe it or not, the fastest way to synchronise the databases is not to
> synchronise the databases.  Instead you keep a log of the instructions used
> to modify the database.

Or, this might be an even better solution.  Note that the
sqlite3_trace() function (see
https://www.sqlite.org/c3ref/profile.html) can help here.

A third solution would be to use the session extension, which will be
merged to trunk on the next release.  See
https://www.sqlite.org/c3ref/profile.html for details.  One advantage
of using the session extension is that it will work even if the two
database files are not identical at the onset.  A disadvantage is that
the session extension only works with rowid tables, not WITHOUT ROWID
tables.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Richard Hipp-3
On 5/6/16, Richard Hipp <[hidden email]> wrote:

> On 5/6/16, Simon Slavin <[hidden email]> wrote:
>>
>> Believe it or not, the fastest way to synchronise the databases is not to
>> synchronise the databases.  Instead you keep a log of the instructions
>> used
>> to modify the database.
>
> Or, this might be an even better solution.  Note that the
> sqlite3_trace() function (see
> https://www.sqlite.org/c3ref/profile.html) can help here.
>
> A third solution would be to use the session extension, which will be
> merged to trunk on the next release.  See
> https://www.sqlite.org/c3ref/profile.html

I bungled the copy/paste of that URL.  Should be
https://www.sqlite.org/draft/sessionintro.html

> for details.  One advantage
> of using the session extension is that it will work even if the two
> database files are not identical at the onset.  A disadvantage is that
> the session extension only works with rowid tables, not WITHOUT ROWID
> tables.
>
> --
> D. Richard Hipp
> [hidden email]
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

The Session Extension (future SQLite extension)

Simon Slavin-3

On 6 May 2016, at 1:51pm, Richard Hipp <[hidden email]> wrote:

> I bungled the copy/paste of that URL.  Should be
> https://www.sqlite.org/draft/sessionintro.html

That's very interesting.  I note that

<https://www.sqlite.org/draft/session/sqlite3changeset_invert.html>

says "This function does not change the order in which changes appear within the changeset. It merely reverses the sense of each individual change.".

This will not be useful to anyone trying to undo a set of changes to one row.

CREATE TABLE testTable (theText TEXT)
INSERT INTO testTable VALUES ('A')
-- session starts here
UPDATE textTable SET theText='B' -- changes A to B
UPDATE textTable SET theText='C' -- changes B to C
-- session ends here

Inverting that without reversing the order will yield

-- changes B to A (documentation suggest that this might be ignored ?)
-- changes C to B

The result would be to leave the value at B, which was true neither at the beginning nor end of the session.

Are we meant to iterate backwards through the inverted changeset ?  I don't see a way to do it.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Gerry Snyder-4
In reply to this post by Stephan Buchert-2
On 5/6/2016 5:32 AM, Stephan Buchert wrote:
> We are using Sqlite for data from satellite Earth observations. It
> works very well. Thanks to everybody contributing to Sqlite, uppermost
> Dr. Hipp.
>
> The largest database file has now grown to about 180 GB.

One feature of SQLite -- the whole database in one file -- is normally
an advantage but becomes less so when the file is huge.


>   I need to have
> copies of the files at at least two different places. The databases are
> updated regularly as new data from the satellites become available.

Others have suggested keeping track of changes using a logging file.
That allows all the SELECT statements to remain unchanged, while
complicating the input side of things.

Another approach could be to have new observations go into a separate
file (one day's worth, or whatever makes sense). This much smaller file
could be shipped to the other site(s) and then merged into each copy of
the main database. Almost no changes to the input logic, but every
SELECT would have to use a JOIN. No idea how painful  process that would
be.

>
> Having the copies of the file synced becomes increasingly tedious
> as their sizes increase. Ideal would be some kind of
> incremental backup/sync facility.
>
> I looked at https://www.sqlite.org/backup.html, but, as it does
> a complete backup, this takes a long time for such large
> databases (files are mounted over networks), and other work gets
> blocked.
>
> An alternative is perhaps https://www.sqlite.org/sqldiff.html, but it
> is also slow for large databases. Most of the data are binary, and the
> SQL input/output of sqldiff/sqlite3 is probably not very efficient.
>
> A solution might be to add an extension to the standard VFS
> http://www.sqlite.org/vfs.html. Each page would need a counter which
> increments when the page is modified (and there seems to
> be spare place for such counters). Then the backup could test which
> pages need to be updated. Is there any chance that such an extension
> could be added?
>
> A kind of hack-ish solution might be to update the primary database
> files in WAL mode, copy only the WAL file to the secondary place,
> and force there WAL checkpoint. Would this work?
>
> /Stephan
>
Gerry
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Simon Slavin-3

On 6 May 2016, at 3:40pm, Gerry Snyder <[hidden email]> wrote:

> One feature of SQLite -- the whole database in one file -- is normally an advantage but becomes less so when the file is huge.

Believe me.  It's still a huge advantage.  Have you ever tried to copy a MySQL database off a non-working server by figuring out which files contained it ?  That's three days of my life I wish I could forget.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Stephen Chrzanowski
In reply to this post by Gerry Snyder-4
Gerry;

I trashed the email I was going to send.  You had the same line of thought
as me in regards to chopping the file on a per-day basis, but, what made me
trash it was any auto-numbered PKs that would be a hassle in new files,
unless that information was put into the new DB upon creation.

I agree that when files get large, a revamp of how the data needs to be
stored has to be examined when the new consideration of how to backup the
sucker becomes more of a logistical nightmare.  With MySQL, data
replication like that is somewhat a breeze.  With SQLite, the convenience
and portability applications granted by SQLite die when files become too
large to deal with for backup purposes.

On Fri, May 6, 2016 at 10:40 AM, Gerry Snyder <[hidden email]>
wrote:

On 5/6/2016 5:32 AM, Stephan Buchert wrote:

>
>> We are using Sqlite for data from satellite Earth observations. It
>> works very well. Thanks to everybody contributing to Sqlite, uppermost
>> Dr. Hipp.
>>
>> The largest database file has now grown to about 180 GB.
>>
>
> One feature of SQLite -- the whole database in one file -- is normally an
> advantage but becomes less so when the file is huge.
>
>
>   I need to have
>> copies of the files at at least two different places. The databases are
>> updated regularly as new data from the satellites become available.
>>
>
> Others have suggested keeping track of changes using a logging file. That
> allows all the SELECT statements to remain unchanged, while complicating
> the input side of things.
>
> Another approach could be to have new observations go into a separate file
> (one day's worth, or whatever makes sense). This much smaller file could be
> shipped to the other site(s) and then merged into each copy of the main
> database. Almost no changes to the input logic, but every SELECT would have
> to use a JOIN. No idea how painful  process that would be.
> Gerry
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Jean-Luc Hainaut
In reply to this post by Simon Slavin-3
Le 14:43 06/05/2016,Simon Slavin écrit:

>On 6 May 2016, at 1:32pm, Stephan Buchert <[hidden email]> wrote:
>
>> The largest database file has now grown to about 180 GB. I need to have
>> copies of the files at at least two different places. The databases are
>> updated regularly as new data from the satellites become available.
>>
>> Having the copies of the file synced becomes increasingly tedious
>> as their sizes increase. Ideal would be some kind of
>> incremental backup/sync facility.
>
>Believe it or not, the fastest way to synchronise the databases is not to synchronise the databases.  Instead you keep a log of the instructions used to modify the database.  You might, for example, modify the library that you use for INSERT, DELETE and UPDATE commands to execute those commands and also save the command to another 'commandLog' table.  Or perhaps just append those commands to a plain text file.
>
>Then instead of sending any data to the other sites you send this list of commands to the other sites and have them execute them.
>
>Once you start implementing this you'll see that it's more complicated than I have described but the text of your post suggests that you're a good enough programmer to do it properly.

This does not need to be so complicated: the problem can be solved by three triggers (per table) that insert in a log table the components of the three data modification commands. At definite time points, the contents of this table is used to generate the SQL data modification commands for the other DB.

J-L Hainaut


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Roger Binns
In reply to this post by Stephan Buchert-2
On 06/05/16 05:32, Stephan Buchert wrote:
> Having the copies of the file synced becomes increasingly tedious
> as their sizes increase. Ideal would be some kind of
> incremental backup/sync facility.

Out of curiousity, would an approach of using multiple databases and
using ATTACH to "unify" them work?  The individual databases could
represent shorter amounts of time (eg a week) and as you state smaller
files are easier to deal with.

Roger



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Jeffrey Mattox
In reply to this post by Simon Slavin-3
As an aside, this is how Apple syncs Core Data to iCloud (and then to multiple iOS devices) if the backing store uses SQLite (the default).  When a small amount of data changes (which is common), the changes get send out, not the entire (mostly unchanged and potential huge) database.

Jeff


> On May 6, 2016, at 7:43 AM, Simon Slavin <[hidden email]> wrote:
>
> Believe it or not, the fastest way to synchronise the databases is not to synchronise the databases.  Instead you keep a log of the instructions used to modify the database.  You might, for example, modify the library that you use for INSERT, DELETE and UPDATE commands to execute those commands and also save the command to another 'commandLog' table
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Jim Morris
Doesn't this eliminate the use of prepared statements?


On 5/6/2016 11:10 AM, Jeffrey Mattox wrote:

> As an aside, this is how Apple syncs Core Data to iCloud (and then to multiple iOS devices) if the backing store uses SQLite (the default).  When a small amount of data changes (which is common), the changes get send out, not the entire (mostly unchanged and potential huge) database.
>
> Jeff
>
>
>> On May 6, 2016, at 7:43 AM, Simon Slavin <[hidden email]> wrote:
>>
>> Believe it or not, the fastest way to synchronise the databases is not to synchronise the databases.  Instead you keep a log of the instructions used to modify the database.  You might, for example, modify the library that you use for INSERT, DELETE and UPDATE commands to execute those commands and also save the command to another 'commandLog' table
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Hick Gunter
No,you just have to log the bound parameters and a reference to the prepared statement (so the other side will know which statement to prepare).
Or just log the statement & the parameters each time.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Jim Morris
Sent: Freitag, 06. Mai 2016 20:14
To: [hidden email]
Subject: Re: [sqlite] Incremental backup/sync facility?

Doesn't this eliminate the use of prepared statements?


On 5/6/2016 11:10 AM, Jeffrey Mattox wrote:

> As an aside, this is how Apple syncs Core Data to iCloud (and then to multiple iOS devices) if the backing store uses SQLite (the default).  When a small amount of data changes (which is common), the changes get send out, not the entire (mostly unchanged and potential huge) database.
>
> Jeff
>
>
>> On May 6, 2016, at 7:43 AM, Simon Slavin <[hidden email]> wrote:
>>
>> Believe it or not, the fastest way to synchronise the databases is not to synchronise the databases.  Instead you keep a log of the instructions used to modify the database.  You might, for example, modify the library that you use for INSERT, DELETE and UPDATE commands to execute those commands and also save the command to another 'commandLog' table
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Incremental backup/sync facility?

Richard Hipp-3
On 5/6/16, Hick Gunter <[hidden email]> wrote:
> No,you just have to log the bound parameters and a reference to the prepared
> statement (so the other side will know which statement to prepare).
> Or just log the statement & the parameters each time.

The sqlite3_trace() interface fills in the values for the parameters
automatically.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: The Session Extension (future SQLite extension)

Zsbán Ambrus
In reply to this post by Simon Slavin-3
As for the session extension
"https://www.sqlite.org/draft/sessionintro.html", what I'd like to ask
is when this is more useful than the RBU extension
"http://sqlite.org/rbu.html" ?  The two seem to serve a similar
purpose.

Ambrus
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: The Session Extension (future SQLite extension)

Dave Wellman
My apologies if this has already been raised...

When I first read about this I thought (assumed) that 'changes' were meant
to be just data changes, i.e. changes to rows where they have been inserted,
updated and/or deleted. In my experience a fairly typical journaling
function that a number of dbms's offer. A typical use case for this would
be: run the 'real' processing on the production system, copy the journal
information (what is referred to as the 'changeset'  or 'patchset') to a
backup system and then apply the same changes to that system. This is one
way of keeping a disaster recovery environment up to date.

However, the documentation includes the following: "They work all day, in
parallel, each making their own customizations and tweaks to the design. "

Does the "to the design" imply that this feature would also cater for DDL
changes to tables, indexes etc.?

Cheers,
Dave


Ward Analytics Ltd - information in motion
Tel: +44 (0) 118 9740191
Fax: +44 (0) 118 9740192
www: http://www.ward-analytics.com

Registered office address: The Oriel, Sydenham Road, Guildford, Surrey,
United Kingdom, GU1 3SR
Registered company number: 3917021 Registered in England and Wales.

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Zsbán
Ambrus
Sent: 07 May 2016 11:40
To: SQLite mailing list
Subject: Re: [sqlite] The Session Extension (future SQLite extension)

As for the session extension
"https://www.sqlite.org/draft/sessionintro.html", what I'd like to ask is
when this is more useful than the RBU extension "http://sqlite.org/rbu.html"
?  The two seem to serve a similar purpose.

Ambrus
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users