sqlite db getting corrupt on power outage scenarios

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

sqlite db getting corrupt on power outage scenarios

Mayank Kumar (mayankum)
Hi Sqlite Users
I am running into a situation where sometimes after the power is restored my sqlite db running on linux is corrupted. Here is what I see with integrity_check:-

*** in database main ***
Page 895: btreeInitPage() returns error code 11
On tree page 2 cell 765: Child page depth differs
On tree page 2 cell 766: Child page depth differs

Whenever this occurs I have a journal file as well as the db file . I have the following questions:-


I am using the following sequence of statements to commit a tx to sqlite:-

1.       rc = sqlite3_open("test.db", &db);

2.       repeat in a loop for different key and blob

3.       lRc = sqlite3_exec(db, CREATE_TABLE, callback, (void*)data, &zErrMsg);

4.       sqlite3_prepare_v2(db, BEGIN_TX, strlen(BEGIN_TX), &beginTxStmt, NULL)

5.       sqlite3_prepare_v2(db, COMMIT_TX, strlen(COMMIT_TX), &commitTxStmt, NULL)

6.       sqlite3_prepare_v2(db, INSERT_RECORD, strlen(INSERT_RECORD), &storeStmt, NULL)

7.       sqlite3_step(beginTxStmt);

8.       sqlite3_reset(beginTxStmt);

9.       sqlite3_bind_int64(storeStmt, 1, key);

10.   sqlite3_bind_blob(storeStmt, 2, str, len, SQLITE_STATIC);

11.   sqlite3_step(storeStmt);

12.   sqlite3_reset(storeStmt);

13.   lRc = sqlite3_step(commitTxStmt);

14.   sqlite3_reset(commitTxStmt);

15.   end repeat


My questions are:-

1.       when the db gets corrupt, I expect  that the next read/write statement will look at the journal and recover from it . So I see the journal file does gets deleted after the power is restored and the sqlite tries to write again, but the sqlite db is still corrupt. Is there a way to verify the rollback using the journal file was successful or not ?

2.       Is there a way to verify if the journal file is correct and is itself not corrupt ?

3.       Assuming that the journal file is a copy of the original db before the transaction, is there a manual recovery possible using only the journal file. For e.g. can I delete the corrupted db and rename the journal file as the db file ? Are there tools which can convert the journal file to the sqlite db file ?

4.       is there a  way I can verify what data is causing the corruption ?

5.       can my application run some on the fly checks on the db after each write to verify the database is sane and not corrupted ?

6.       do you see any issues with the sequence of statements I am using to do multiple transactions ?


Sorry for too many questions , but this is a high priority issue for me and I want to make sure I have enough data to tackle it.


Cheers
Mayank
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Richard Hipp-3
On Wed, Nov 6, 2013 at 2:10 PM, Mayank Kumar (mayankum)
<[hidden email]>wrote:

> Hi Sqlite Users
> I am running into a situation where sometimes after the power is restored
> my sqlite db running on linux is corrupted. Here is what I see with
> integrity_check:-
>
> *** in database main ***
> Page 895: btreeInitPage() returns error code 11
> On tree page 2 cell 765: Child page depth differs
> On tree page 2 cell 766: Child page depth differs
>

SQLite is suppose to recover automatically from a power loss, rolling back
the last transaction, and NOT corrupting the database file.

However, in order to do this, SQLite relies on the filesystem behaving as
advertised.  Corruption such as you describe can result from a broken
fsync() implementation or from consumer-grade disk drives that lie about
having flushed their track buffers (saying that they have flushed their
track buffers to oxide when in fact they have not).  Please read more at

    http://www.sqlite.org/howtocorrupt.html

Please note that setting "PRAGMA journal_mode=WAL" makes SQLite more
resistent to disk-drive lies.  Not immune, but more resistant.  You might
want to set WAL mode if you have not done so already.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Mayank Kumar (mayankum)
Thanks Richard.
Are there known tools which tell us what data corrupted the db or some way of removing only the corrupted section of the db ?
I will definitely look into the wal mode. Are there any test results for disk failure robustness of this mode compared to journal mode.

I also saw some references to corruption being caused when the same db is written to by 3.6 and then being written to by 3.7. Is that true, because we have similar scenario where the db was originally written by 3.4.0 and when we upgraded our software the db was written to by 3.7.7.1. Do you think this scenario could also cause a db corruption.

-Mayank

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Richard Hipp
Sent: Wednesday, November 06, 2013 11:18 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] sqlite db getting corrupt on power outage scenarios

On Wed, Nov 6, 2013 at 2:10 PM, Mayank Kumar (mayankum)
<[hidden email]>wrote:

> Hi Sqlite Users
> I am running into a situation where sometimes after the power is
> restored my sqlite db running on linux is corrupted. Here is what I
> see with
> integrity_check:-
>
> *** in database main ***
> Page 895: btreeInitPage() returns error code 11 On tree page 2 cell
> 765: Child page depth differs On tree page 2 cell 766: Child page
> depth differs
>

SQLite is suppose to recover automatically from a power loss, rolling back the last transaction, and NOT corrupting the database file.

However, in order to do this, SQLite relies on the filesystem behaving as advertised.  Corruption such as you describe can result from a broken
fsync() implementation or from consumer-grade disk drives that lie about having flushed their track buffers (saying that they have flushed their track buffers to oxide when in fact they have not).  Please read more at

    http://www.sqlite.org/howtocorrupt.html

Please note that setting "PRAGMA journal_mode=WAL" makes SQLite more resistent to disk-drive lies.  Not immune, but more resistant.  You might want to set WAL mode if you have not done so already.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Richard Hipp-3
On Thu, Nov 7, 2013 at 6:50 PM, Mayank Kumar (mayankum)
<[hidden email]>wrote:

>
> I also saw some references to corruption being caused when the same db is
> written to by 3.6 and then being written to by 3.7. Is that true, because
> we have similar scenario where the db was originally written by 3.4.0 and
> when we upgraded our software the db was written to by 3.7.7.1. Do you
> think this scenario could also cause a db corruption.
>
>
Unlikely.  That bug was only present in version 3.7.0 and was fixed in
3.7.0.1.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Simon Slavin-3
In reply to this post by Mayank Kumar (mayankum)

On 7 Nov 2013, at 11:50pm, Mayank Kumar (mayankum) <[hidden email]> wrote:

> Are there known tools which tell us what data corrupted the db or some way of removing only the corrupted section of the db ?

Download the SQLite shell tool for your platform.
Use .dump to dump the database to a SQL command file.
Then use the same shell tool to create a blank database and .read those commands to read that data back in.

We have no way of knowing how much, if any, of the data from the original database this will recover.  Take a look at the text file yourself and see if you can figure out how much went missing.

> I will definitely look into the wal mode. Are there any test results for disk failure robustness of this mode compared to journal mode.

This won't help.  As long as you're using a network file system which does not support fsync() properly you will get more corruption.  Is this database on the hard disk in the computer running the SQLite application, or are you accessing it across a network ?  What OS and disk format are you using ?

>>> 1.       rc = sqlite3_open("test.db", &db);

change your app to look at this result code and check that it equals SQLITE_OK.  And do the same thing with the result codes returned by all your other sqlite3_ calls.  If any of them are not SQLITE_OK report the value and quit.

>>> 1.       when the db gets corrupt, I expect  that the next read/write statement will look at the journal and recover from it .

Depends on the nature of the corruption.  Some corruption can't be detected.

>>> So I see the journal file does gets deleted after the power is restored and the sqlite tries to write again, but the sqlite db is still corrupt. Is there a way to verify the rollback using the journal file was successful or not ?

Simply ignoring the journal file, or having had the journal file deleted should yield an uncorrupted database.  Possibly slightly out of date, but uncorrupted.  This is SQLite's fall-back situation.  Apparently it's not working in your case.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Richard Hipp-3
On Thu, Nov 7, 2013 at 7:57 PM, Simon Slavin <[hidden email]> wrote:

>
> > I will definitely look into the wal mode. Are there any test results for
> disk failure robustness of this mode compared to journal mode.
>
> This won't help.  As long as you're using a network file system which does
> not support fsync() properly you will get more corruption.
>

It does help, because the there are fewer times in WAL mode processing
where the order of write operations matter.  Hence, there are fewer
opportunities for a power failure or crash to cause problems.  But it is
not perfect, and corruption can still occur.  The only way to reliably
prevent corruption is to use a filesystem that never reorders writes across
an fsync() call.

The previous paragraph assumes that the cause of your corruption is a bug
in the filesystem that causes writes that happen before fsync() to actually
occur after one or more writes that were issued after the fsync().  If the
cause of corruption is something else, then WAL mode might not help as much.


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Mayank Kumar (mayankum)
In reply to this post by Richard Hipp-3
Thanks Richard.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Richard Hipp
Sent: Thursday, November 07, 2013 4:20 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] sqlite db getting corrupt on power outage scenarios

On Thu, Nov 7, 2013 at 6:50 PM, Mayank Kumar (mayankum)
<[hidden email]>wrote:

>
> I also saw some references to corruption being caused when the same db
> is written to by 3.6 and then being written to by 3.7. Is that true,
> because we have similar scenario where the db was originally written
> by 3.4.0 and when we upgraded our software the db was written to by
> 3.7.7.1. Do you think this scenario could also cause a db corruption.
>
>
Unlikely.  That bug was only present in version 3.7.0 and was fixed in 3.7.0.1.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Mayank Kumar (mayankum)
In reply to this post by Simon Slavin-3
Thanks Simon. See inline

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Thursday, November 07, 2013 4:57 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] sqlite db getting corrupt on power outage scenarios


On 7 Nov 2013, at 11:50pm, Mayank Kumar (mayankum) <[hidden email]> wrote:

> Are there known tools which tell us what data corrupted the db or some way of removing only the corrupted section of the db ?

Download the SQLite shell tool for your platform.
Use .dump to dump the database to a SQL command file.
Then use the same shell tool to create a blank database and .read those commands to read that data back in.

We have no way of knowing how much, if any, of the data from the original database this will recover.  Take a look at the text file yourself and see if you can figure out how much went missing.
[MK] Sounds interesting. Will look into this.

> I will definitely look into the wal mode. Are there any test results for disk failure robustness of this mode compared to journal mode.

This won't help.  As long as you're using a network file system which does not support fsync() properly you will get more corruption.  Is this database on the hard disk in the computer running the SQLite application, or are you accessing it across a network ?  What OS and disk format are you using ?
[MK] I am using a flash drive running on linux kernel 2.6.27.10(SMP). The corruption is sometimes seen on sqlite db on the flash drive and sometimes seen on sqlite db in nvram. Network file system is not involved.
>>> 1.       rc = sqlite3_open("test.db", &db);

change your app to look at this result code and check that it equals SQLITE_OK.  And do the same thing with the result codes returned by all your other sqlite3_ calls.  If any of them are not SQLITE_OK report the value and quit.
[MK] I do look at all return codes. I just pasted the sequence of apis not the actual code.

>>> 1.       when the db gets corrupt, I expect  that the next read/write statement will look at the journal and recover from it .

Depends on the nature of the corruption.  Some corruption can't be detected.

>>> So I see the journal file does gets deleted after the power is restored and the sqlite tries to write again, but the sqlite db is still corrupt. Is there a way to verify the rollback using the journal file was successful or not ?

Simply ignoring the journal file, or having had the journal file deleted should yield an uncorrupted database.  Possibly slightly out of date, but uncorrupted.  This is SQLite's fall-back situation.  Apparently it's not working in your case.
[MK] in my case the db is corrupted but there is also a journal file present. When I try to write a new transaction to the corrupted db, the new transaction is successful and at the same time the journal file gets deleted. The resultant db is still corrupted. Can we assume from this behavior that the corruption happened before the last failed transaction for which there is a journal file assuming that a journal file is created per transaction.
Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

Simon Slavin-3

On 8 Nov 2013, at 6:41am, Mayank Kumar (mayankum) <[hidden email]> wrote:

> [MK] I am using a flash drive running on linux kernel 2.6.27.10(SMP). The corruption is sometimes seen on sqlite db on the flash drive and sometimes seen on sqlite db in nvram. Network file system is not involved.

I have used SQLite extensively on a Macintosh, storing the data on a Flash drive and pulling the drive without warning to simulate hardware and power problems.  I have tested with Flash drives formatted in FAT format and other Flash drives formatted in HFS+ (A Macintosh native format.)  While I have lost new transactions by doing this, unless the format structure of the drive was damaged I have not ended up with any databases which fail

PRAGMA integrity_check

.  Note that it is important not to manually delete journal files after plugging the drive back in but before starting a SQLite app.

The one problem with this is, as noted, when the format structure of the drive is damaged, i.e. the drive fails in disk/folder/file-structure checking software like chkdsk/Disk Utility.  Under those situations, SQLite can attempt to rescue its database file but end up causing other problems because it is writing to corrupt workspace.  Cannot be solved inside SQLite.

> [MK] in my case the db is corrupted but there is also a journal file present. When I try to write a new transaction to the corrupted db, the new transaction is successful and at the same time the journal file gets deleted. The resultant db is still corrupted. Can we assume from this behavior that the corruption happened before the last failed transaction for which there is a journal file assuming that a journal file is created per transaction.

I'm not familiar enough with the inner workings of SQLite to give a definitive answer, but your description of what happens is the same as what I see and ...

<http://www.sqlite.org/tempfiles.html>

"The default journaling mode is DELETE, which is the default behavior of deleting the rollback journal file at the end of each transaction"

which suggests that you're right if you haven't changed your journal mode.  You may be able to use the PRAGMA I listed earlier to learn more about your setup.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

J. Merrill
In reply to this post by Mayank Kumar (mayankum)
Consumer-grade flash drives are notoriously unreliable. I suggest that (at a minimum) you test the drive with the software described at

http://sosfakeflash.wordpress.com/2008/09/02/h2testw-14-gold-standard-in-detecting-usb-counterfeit-drives/

Have you what happens to your flash drive (and its file system) when non-database applications are writing to files on the flash drive when a power outage occurs?

J. Merrill

Date: Fri, 8 Nov 2013 06:41:15 +0000
From: "Mayank Kumar (mayankum)" <[hidden email]>
To: General Discussion of SQLite Database <[hidden email]>
Subject: Re: [sqlite] sqlite db getting corrupt on power outage
        scenarios

[[small subset only]]
[MK] I am using a flash drive running on linux kernel 2.6.27.10(SMP). The corruption is sometimes seen on sqlite db on the flash drive and sometimes seen on sqlite db in nvram. Network file system is not involved.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: sqlite db getting corrupt on power outage scenarios

LincolnBurrows
This post has NOT been accepted by the mailing list yet.
In reply to this post by Mayank Kumar (mayankum)
you may have a look on this helpful article http://blog.systoolsgroup.com/sqlite-corruption-causes.html