Backup and integrity check questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Backup and integrity check questions

Rune Torgersen
We have an application that has multiple sqlite3 databases (30-40)  open with exclusive locking.
Every night we do a database backup and a database integrity check.
The backup is done using sqlite3_backup_*. The check is done using a "PRAGMA integrity_check;"

Currently we allow reads/writes while doing both.

Management wants to know if we're doing it correctly, or if there are faster/easier ways to do backup (and check).
Also wants to know if a backup done using the live backup API gives us an exact copy (with any possible corruption) or if the backup is regenerated.
The checks are done to detect some issues we have had with corrupted databases (we think we fixed the issue, but are running the checks to make sure).

The databases have been opened with two connections (one for reads, one for writes), and use the following options:
    sqlite3_busy_timeout(mDbConn, 500);
    sqlite3_exec(mDbConn, "PRAGMA locking_mode = EXCLUSIVE;", 0, 0, 0);
    sqlite3_exec(mDbConn, "PRAGMA synchronous = NORMAL;", 0, 0, 0);
    sqlite3_exec(mDbConn, "PRAGMA journal_mode = TRUNCATE;", 0, 0, 0);

    sqlite3_busy_timeout(mReadDbConn, 500);
    sqlite3_exec(mReadDbConn, "PRAGMA read_uncommitted = 1;", 0, 0, 0);

Thanks,
Rune Torgersen
Innovative Systems LLC.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

Simon Slavin-3
On 25 Jul 2018, at 10:56pm, Rune Torgersen <[hidden email]> wrote:

> Management wants to know if we're doing it correctly, or if there are faster/easier ways to do backup (and check).

Please excuse me mentioning things I'm sure you are already doing correctly.  Your question is useful to many users and this response will be read by users of all levels of experience.

The following answers depend on your code noting the result code returned by each call to the API and checking that it is SQLITE_OK or SQLITE_DONE as appropriate.  So use an ASSERT, or write your own code to make this check and write a log entry or terminate with an error message if it gets an unexpected result.  If you are not doing this, fix that as a high priority.  

The things you describe as doing now will work correctly if the backup API is allowed to do its job.  The backup API has to restart each time a change is made to the database.  If your logs show that the backup API terminates correctly for each of your files it seems that in your case there is enough time between changes to allow this to happen, perhaps not at the first backup attempt, but soon enough.

Is it possible to suspend execution (and close all connections) while the databases are being backed up ?  If so, it may be faster to close all connections to a database, then use an OS call to duplicate it, then start things going again.  Restarting the server may also work around resource leaks and other programming errors.  But your setup may rely on 24/7 availability making this impossible, or it might be programmatically difficult to get into the guts of your program and tell it to detach and suspend execution.

Once your backups are taken you can run integrity checks on the /backup/ copies of the files, allowing the main program to proceed without interruption.  This assumes you are maintaining at least two generations of backup.

> Also wants to know if a backup done using the live backup API gives us an exact copy (with any possible corruption) or if the backup is regenerated.

The backup API copies each page of the source database to the destination database, from end to end.  It copies even pages marked as not in use (e.g. deleted rows).  It has no understanding of the structure of the database file and, if I understand the documentation correctly, would copy corrupt sequences without noticing them.

Since you are asking questions about backup, it's worth mentioning that backup copies should be in different storage to the databases, preferably to a server or external drive.  You're probably doing this already, but I have seen a disturbing number of cases where a drive failure lost both database and backup because there was no budget for an external drive, or a manager wanted all desks to have the same hardware.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

Rowan Worth-2
In reply to this post by Rune Torgersen
On 26 July 2018 at 05:56, Rune Torgersen <[hidden email]> wrote:

> The databases have been opened with two connections (one for reads, one
> for writes), and use the following options:
>     sqlite3_busy_timeout(mDbConn, 500);
>     sqlite3_exec(mDbConn, "PRAGMA locking_mode = EXCLUSIVE;", 0, 0, 0);
>

Surely this effectively reduces your number of connections to one? After
the write connection performs its first update, the read connection will be
locked out forever. Unless the write connection is specifically opened to
perform the update and then closed, in which case the PRAGMA is superfluous?

PRAGMA quick_check is faster, but not as exhaustive as integrity_check.

There's more efficient ways to copy a DB than the backup api, but they
involve either downtime as Simon said, or some risk in correctly managing
locks.

It's not clear whether you have another process calling sqlite3_backup_* or
that is done by the application itself. The latter can be more efficient as
sqlite will not have to restart the backup if the application updates the
DB (any writes to a DB are automatically propagated to in-progress backups
within the same process).

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

Simon Slavin-3
On 27 Jul 2018, at 6:02am, Rowan Worth <[hidden email]> wrote:

> (any writes to a DB are automatically propagated to in-progress backups
> within the same process).

I didn't know that.  Thanks.  It's clever.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

chelle103@sky.com
In reply to this post by Rune Torgersen

--------------------------------------------
On Fri, 27/7/18, Simon Slavin <[hidden email]> wrote:

 Subject: Re: [sqlite] Backup and integrity check questions
 To: "SQLite mailing list" <[hidden email]>
 Date: Friday, 27 July, 2018, 6:35
 
 On 27 Jul 2018, at 6:02am, Rowan
 Worth <[hidden email]>
 wrote:
 
 > (any writes to
 a DB are automatically propagated to in-progress backups
 > within the same process).
 
 I didn't know that. 
 Thanks.  It's clever.
 
 Simon.
 _______________________________________________
 sqlite-users mailing list
 [hidden email]
 http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
 
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

Richard Hipp-3
In reply to this post by Rune Torgersen
On 7/25/18, Rune Torgersen <[hidden email]> wrote:

> Management wants to know if we're doing it correctly, or if there are
> faster/easier ways to do backup (and check).
> Also wants to know if a backup done using the live backup API gives us an
> exact copy (with any possible corruption) or if the backup is regenerated.

Seem right.  If a change happens to the database in the middle of a
backup, the backup automatically restarts, so that at the end of a
backup you are guaranteed to have a consistent copy from a single
point in time (something that you are not guaranteed to have if you
copy the file externally).


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

R Smith-2
On 2018/07/27 12:51 PM, Richard Hipp wrote:
>
> Seem right.  If a change happens to the database in the middle of a
> backup, the backup automatically restarts, so that at the end of a
> backup you are guaranteed to have a consistent copy from a single
> point in time (something that you are not guaranteed to have if you
> copy the file externally).

Unless of course you close all connections to the DB first, right?

I think the consideration comes to this:

Using BackUp API:
Advantages
-- No need to close all connections, can back-up a live DB.
-- Will re-start a backup if DB writes take place during.
-- Produces a consistent state file.
-- Can be done using only internal sqlite API (no manual file-handling).

Caveats
-- If updates happen very often on a large-ish DB, the backup may
constantly re-start and never complete.

Using file-system Copy:
Advantages
-- Quick to backup (OS/FS caching assist)
-- Consistent state single file (provided all DB connections are closed)
-- No restarting, but DB must remain closed (for writing at least) for
the duration of the Copy.

Caveats
-- Need to do your own file-system handling / error checking.
-- DB has to be closed to force the consistent state single file before
copying (commit journal files to main DB file).
-- No restarting, so no DB write transactions during copy process.


Conclusion:  Unless you have the rare case of a constantly + frequently
updated large DB, use the API backup.


(If I've missed a point, or got it wrong, someone please amend it)


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

Rune Torgersen
In reply to this post by Rune Torgersen
> Rowan Worth Thu, 26 Jul 2018 22:02:50 -0700
>
> On 26 July 2018 at 05:56, Rune Torgersen <[hidden email]> wrote:
>
> > The databases have been opened with two connections (one for reads, one
> > for writes), and use the following options:
> >     sqlite3_busy_timeout(mDbConn, 500);
> >     sqlite3_exec(mDbConn, "PRAGMA locking_mode = EXCLUSIVE;", 0, 0, 0);
> >
>
> Surely this effectively reduces your number of connections to one?

No, both connections are from within the same application, and have seemed to work just fine for about 10 years now...

>
> PRAGMA quick_check is faster, but not as exhaustive as integrity_check.
>
> There's more efficient ways to copy a DB than the backup api, but they
> involve either downtime as Simon said, or some risk in correctly managing
> locks.

The application has uptime sometimes measured in years, so closing it down to backups are not feasible.


> It's not clear whether you have another process calling sqlite3_backup_* or
> that is done by the application itself. The latter can be more efficient as
> sqlite will not have to restart the backup if the application updates the
> DB (any writes to a DB are automatically propagated to in-progress backups
> within the same process).

backup is done from a separate thread within same process. Same with the integrity check

Rune Torgersen
Innovative Systems LLC
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Backup and integrity check questions

Rowan Worth-2
On 28 July 2018 at 05:41, Rune Torgersen <[hidden email]> wrote:

> > Rowan Worth Thu, 26 Jul 2018 22:02:50 -0700
> >
> > On 26 July 2018 at 05:56, Rune Torgersen <[hidden email]> wrote:
> >
> > > The databases have been opened with two connections (one for reads, one
> > > for writes), and use the following options:
> > >     sqlite3_busy_timeout(mDbConn, 500);
> > >     sqlite3_exec(mDbConn, "PRAGMA locking_mode = EXCLUSIVE;", 0, 0, 0);
> > >
> >
> > Surely this effectively reduces your number of connections to one?
>
> No, both connections are from within the same application, and have seemed
> to work just fine for about 10 years now...
>

If the write connection is held open I honestly can't see how that's
possible, unless it is never actually used to write.

I tried a quick test with two connections on different threads configured
like this, and as soon as the writer obtains an exclusive lock for the
first time it holds onto it forever (as described in the PRAGMA docs). From
that point on the reader cannot even obtain a SHARED lock to read the
database, and spends the rest of its life in the busy handler.

Note that while the pragma talks about excluding other "processes" from
accessing the DB, "process" and "thread" are interchangeable as far as
sqlite is concerned. That is documented here:
https://www.sqlite.org/lockingv3.html

Last paragraph of section 2.0:

>
> The pager module effectively controls access for separate threads, or
> separate processes, or both. Throughout this document whenever the word
> "process" is written you may substitute the word "thread" without changing
> the truth of the statement.
>


Ah, unless you're in shared cache mode (which PRAGMA read_uncommitted would
suggest), which I don't really know anything about...

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users