Btree page corruption

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Btree page corruption

Nikhil Deshpande
Hi,

We have an application that in a Linux VM that's running into
SQLite DB corruption (after weeks and months of running,
4 such instances yet in different VMs).

We would appreciate some help in debugging this further to identify
source of corruption!

Symptom is btree page corruption, e.g.

 > $ sqlite3 stats.sqlite "pragma integrity_check;"
 > *** in database main ***
 > Page 3818: btreeInitPage() returns error code 11
 > Page 46: btreeInitPage() returns error code 11
 > Error: database disk image is malformed
(Same error is raised for SELECT queries too.)

There were no power-off or reboots in near time vicinity when the
corruption was detected. We have poured over this document
https://sqlite.org/howtocorrupt.html
many times to check if any of the conditions could apply,
but so far no leads yet.

We have also been unable to reproduce the corruption by stressing
application's SQLite DB read/write code paths for a week.

I'm attaching showdb output for the DB header and 2 corrupt pages
if it's of any hint.

---

A bit more application setup context/information:

- Linux kernel 4.4.41
- glibc 2.22
- Ext4 file system, mounted as (rw,relatime,data=ordered).

- Writer C++ process: sqlite-3.17
   - Creates a set of "time series" tables, each table has 2 numeric
     columns (timestamp, int) during initialization.
   - Every 1 minute, 2 threads will do total 15 writes. (using "INSERT OR
     REPLACE ... (timestamp, int)" SQL into 15 tables).
   - SQLite DB opened with SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE
     flags, initialized with "PRAGMA journal_mode=wal;", threading mode
     is Serialized for the libsqlite build, uses default VFS ("unix").
     All other config params are default (e.g. autovacuum is disabled
     etc.).
   - A separate thread runs "PRAGMA quick_check;" periodically every 5
     minutes, in its own separate DB connection.
- Reader process: sqlite-3.11 + Python 2.7.11
   - Periodically reads time series tables for a given timestamp range
     (usually latest 5 minutes) using SELECT queries (no INSERT/UPDATE/
     DELETE from this process).
   - Uses same same "PRAGMA journal_mode=wal", uses the sqlite3 DBAPI
     module from Python standard library.
Apart from above 2, no other processes are accessing the SQLite DB file.

We have updated both the reader and writer to use latest SQLite 3.21,
but without understanding the cause of corruption, we are unable to
say if this update to latest 3.21 would indeed prevent further
occurrences.

Thanks,
  Nikhil

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

showdb_dbheader.txt (2K) Download Attachment
showdb_pgidx_corruptnodes.txt (160 bytes) Download Attachment
showdb_46b.txt (1K) Download Attachment
showdb_46bm.txt (7K) Download Attachment
showdb_3818b.txt (578 bytes) Download Attachment
showdb_3818bm.txt (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Simon Slavin-3
On 21 Dec 2017, at 9:47pm, Nikhil Deshpande <[hidden email]> wrote:

> We have an application that in a Linux VM that's running into
> SQLite DB corruption (after weeks and months of running,
> 4 such instances yet in different VMs).
>
> [snip]
>
> There were no power-off or reboots in near time vicinity when the
> corruption was detected. We have poured over this document
> https://sqlite.org/howtocorrupt.html
> many times to check if any of the conditions could apply,
> but so far no leads yet.
>
> We have also been unable to reproduce the corruption by stressing
> application's SQLite DB read/write code paths for a week.

Well done for finding the appropriate web page.  You’ve done all the easy stuff which leaves us with just the hard questions.  These won’t necessarily give us an answer, but they might remind one of your readers of something.

When "pragma integrity_check" detects an error, does "PRAGMA quick_check" do too ?

Is your database file really about 65 Meg in size ?  Just roughly.

How is your application terminated ?  User action or self-determination ?

Does your application call sqlite3_close() ?  Does it cbeck the code returned ?

Does your application call sqlite3_shutdown() ?  Does it cbeck the code returned ?

How does your VM get shutdown ?  Does it correctly shut down your application before it quits ?

Can you include a "pragma integrity_check" at startup ?
Can you include a "pragma integrity_check" executed at regular intervals ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Richard Hipp-3
In reply to this post by Nikhil Deshpande
On 12/21/17, Nikhil Deshpande <[hidden email]> wrote:
>
> There were no power-off or reboots in near time vicinity when the
> corruption was detected.

(1) Might the corruption have been sitting dormant due to some far
away power-off or reboot and was only recently discovered?  How much
do you trust the fsync() system call and the filesystems ability to
honor that system call on the VM on which this code is running?  Do
you run "PRAGMA quick_check" on initial power-up?

(2) Is this the only corruption instance you have seen?

(3) The showdb outputs you sent only shows me the first 8 bytes of the
page.  From those 8 bytes, it looks to be an overflow page that is
being used as a btree page.  Are there any further clues in the hex
dump of the bad pages?

(4) What does "showdb pgidx" look like?

(5) How many threads are accessing the database?  Are you sure that
you have the database set in a thread-safe mode?


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Rowan Worth-2
In reply to this post by Nikhil Deshpande
Does either process take backups of the DB? If so, how is that implemented?
-Rowan

On 22 December 2017 at 05:47, Nikhil Deshpande <[hidden email]>
wrote:

> Hi,
>
> We have an application that in a Linux VM that's running into
> SQLite DB corruption (after weeks and months of running,
> 4 such instances yet in different VMs).
>
> We would appreciate some help in debugging this further to identify
> source of corruption!
>
> Symptom is btree page corruption, e.g.
>
> > $ sqlite3 stats.sqlite "pragma integrity_check;"
> > *** in database main ***
> > Page 3818: btreeInitPage() returns error code 11
> > Page 46: btreeInitPage() returns error code 11
> > Error: database disk image is malformed
> (Same error is raised for SELECT queries too.)
>
> There were no power-off or reboots in near time vicinity when the
> corruption was detected. We have poured over this document
> https://sqlite.org/howtocorrupt.html
> many times to check if any of the conditions could apply,
> but so far no leads yet.
>
> We have also been unable to reproduce the corruption by stressing
> application's SQLite DB read/write code paths for a week.
>
> I'm attaching showdb output for the DB header and 2 corrupt pages
> if it's of any hint.
>
> ---
>
> A bit more application setup context/information:
>
> - Linux kernel 4.4.41
> - glibc 2.22
> - Ext4 file system, mounted as (rw,relatime,data=ordered).
>
> - Writer C++ process: sqlite-3.17
>   - Creates a set of "time series" tables, each table has 2 numeric
>     columns (timestamp, int) during initialization.
>   - Every 1 minute, 2 threads will do total 15 writes. (using "INSERT OR
>     REPLACE ... (timestamp, int)" SQL into 15 tables).
>   - SQLite DB opened with SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE
>     flags, initialized with "PRAGMA journal_mode=wal;", threading mode
>     is Serialized for the libsqlite build, uses default VFS ("unix").
>     All other config params are default (e.g. autovacuum is disabled
>     etc.).
>   - A separate thread runs "PRAGMA quick_check;" periodically every 5
>     minutes, in its own separate DB connection.
> - Reader process: sqlite-3.11 + Python 2.7.11
>   - Periodically reads time series tables for a given timestamp range
>     (usually latest 5 minutes) using SELECT queries (no INSERT/UPDATE/
>     DELETE from this process).
>   - Uses same same "PRAGMA journal_mode=wal", uses the sqlite3 DBAPI
>     module from Python standard library.
> Apart from above 2, no other processes are accessing the SQLite DB file.
>
> We have updated both the reader and writer to use latest SQLite 3.21,
> but without understanding the cause of corruption, we are unable to
> say if this update to latest 3.21 would indeed prevent further
> occurrences.
>
> Thanks,
>  Nikhil
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Nikhil Deshpande
In reply to this post by Simon Slavin-3
Thanks Simon for the response!

On 12/21/17 5:05 PM, Simon Slavin wrote:
> When "pragma integrity_check" detects an error, does "PRAGMA quick_check" do too ?
Yes, both pragmas return same error.

> Is your database file really about 65 Meg in size ?  Just roughly.
Yes:
$ du -sh *
66M appliance_stats.sqlite
32K appliance_stats.sqlite-shm
4.2M appliance_stats.sqlite-wal

num_pages X page_size seem to match file system metadata, fsck has not
revealed any inconsistencies.

> How is your application terminated ?  User action or self-determination ?
The writer process receives a SIGTERM and in that handler signals a
condition that is waited by all other threads (when not doing any work).
If the threads are doing any work, they finish it and then wait on the
condition. Threads are not abnormally terminated.
The writer process runs till the VM is shutdown.

> Does your application call sqlite3_close() ?  Does it cbeck the code returned ?
>
> Does your application call sqlite3_shutdown() ?  Does it cbeck the code returned ?
Yes, the return codes are logged and used as process exit code, our logs
do not show non-zero error codes/logs.

> How does your VM get shutdown ?  Does it correctly shut down your application before it quits ?
VM shutdown is normal shutdown with systemd shutting down a simple
watchdog that forwards shutdown to the SQLite DB writer process.

> Can you include a "pragma integrity_check" at startup ?
> Can you include a "pragma integrity_check" executed at regular intervals ?
The writer process does "pragma quick_check" on every startup at init,
bails out on failure and spawns a separate thread to do same
"pragma quick_check" every 5 minutes (opens it's own separate DB handle
and closes it). Would changing quick_check to integrity_check be
helpful? (Would integrity_check catch such corruption earlier than
quick_check? Would it hold longer exclusive locks on the DB file
that could prevent reads?)


Thanks,
  Nikhil
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Nikhil Deshpande
In reply to this post by Richard Hipp-3
Thanks Richard for the response!

On 12/21/17 5:07 PM, Richard Hipp wrote:

> On 12/21/17, Nikhil Deshpande <[hidden email]> wrote:
>>
>> There were no power-off or reboots in near time vicinity when the
>> corruption was detected.
>
> (1) Might the corruption have been sitting dormant due to some far
> away power-off or reboot and was only recently discovered?  How much
> do you trust the fsync() system call and the filesystems ability to
> honor that system call on the VM on which this code is running?  Do
> you run "PRAGMA quick_check" on initial power-up?
The previous reboot was almost 4 days before the corruption was
detected by the reader process. Trawling through the logs for those
4 days showed no anomalies or different behavior (the writer was
functioning and inserting data as expected, writer did not detect
the corruption that the reader process ran into, our guess is that
the writer was not touching the corrupt btree pages, writing to some
other part of the DB file).

> (2) Is this the only corruption instance you have seen?
We have seen such 4 instances on 4 different VMs in path 4 months.
The corruption symptom is same, btree page header lacking the correct
bits set.

> (3) The showdb outputs you sent only shows me the first 8 bytes of the
> page.  From those 8 bytes, it looks to be an overflow page that is
> being used as a btree page.  Are there any further clues in the hex
> dump of the bad pages?

Attaching full dump of the corrupt pages 46 and 3818.
This page 46 seems to contain reasonable strings (written by the app,
metadata about the timeseries tables), not garbage/arbitrary data.
Page 3818 (belongs to a timeseries table) seems to contain zeros
except at the head 16 bytes and tail end 16 bytes of the page.

> (4) What does "showdb pgidx" look like?
Attaching "showdb pgidx" output, command runs fine (exiting with 0),
showing these 2 pages as corrupt:
    46: corrupt node [sm_stats_metadata], child 2 of page 2
  3818: root corrupt node [sm_stat_disk_latency_rate_dm-9]

> (5) How many threads are accessing the database?  Are you sure that
> you have the database set in a thread-safe mode?
There are total 3 threads in the writer process accessing the DB,
two threads do periodic "INSERT OR REPLACE (timestamp, val)" into
bunch of tables. The third thread runs "pragma quick_check"
periodically.

Logging sqlite3_threadsafe() output shows value 1 (serialized mode
build time setting) and sqlite3_db_mutex() on a db handle shows
a non-NULL value. The DB file is opened using sqlite3_open_v2()
with SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE flags, it does
not explicitly pass SQLITE_OPEN_FULLMUTEX flag (relies on the compile
time threading mode to be inherited, which is serialized threading
mode build time setting).

Thanks,
  Nikhil

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

showdb_46.txt (24K) Download Attachment
showdb_3818.txt (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Nikhil Deshpande
In reply to this post by Rowan Worth-2
On 12/21/17 9:45 PM, Rowan Worth wrote:
> Does either process take backups of the DB? If so, how is that implemented?

Thanks Rowan for the response!
Backup is done by a separate process through command:
    sqlite3 /path/to/db_file .dump > dump.sql
and not using the sqlite3 backup API.

Thanks,
  Nikhil
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Simon Slavin-3
In reply to this post by Nikhil Deshpande


On 27 Dec 2017, at 6:10pm, Nikhil Deshpande <[hidden email]> wrote:

>> Can you include a "pragma integrity_check" at startup ?
>> Can you include a "pragma integrity_check" executed at regular intervals ?
> The writer process does "pragma quick_check" on every startup at init,
> bails out on failure and spawns a separate thread to do same
> "pragma quick_check" every 5 minutes (opens it's own separate DB handle
> and closes it). Would changing quick_check to integrity_check be
> helpful? (Would integrity_check catch such corruption earlier than
> quick_check? Would it hold longer exclusive locks on the DB file
> that could prevent reads?)

Thanks Nikhil.

Had I noticed your email address I wouldn’t have asked picky questions about abnormal termination and VM management.

Your responses indicate that you are doing everything properly.  I don’t see anything there that makes me think that you’re missing a cause of corruption.  However people who know more than I do about SQLite might spot something.

Changing "quick_check" to "integrity_check" may help in this case may be useful in investigating your problem.  "integrity_check" is far more thorough and checks every aspect of data integrity from both directions.  However, the type of corruption you’re experiencing may be one which is spotted just as well by "quick_check".  There’s no way to know without checking it.

But it does suffer from the problem you noted (a 65 Meg database would be locked for longer) and this might be unacceptable in your environment.  Or you might be able to do it just once an hour instead of every 5 minutes.

An alternative might be to run "integrity_check" on backup copies which don’t show up anything on "quick_check".  This could be done without blocking the production system.  If you never find anything then you know "quick_check" is all you need.  But it might spot corruption in some databases you don’t think have been corrupted yet.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Simon Slavin-3
On 27 Dec 2017, at 6:55pm, Simon Slavin <[hidden email]> wrote:

> An alternative might be to run "integrity_check" on backup copies which don’t show up anything on "quick_check".  This could be done without blocking the production system.  If you never find anything then you know "quick_check" is all you need. But it might spot corruption in some databases you don’t think have been corrupted yet.

Just to be clear, SQLite database files can be corrupted in ways that don’t show up until a particular row of a particular index is used.  If a row pointer of a particular index becomes corrupt, and something tries to UPDATE that row, SQLite will corrupt the file even more, causing corruption to spread to the table, and to other indexes of that table.

This is why using a database without problems for months, then suddenly having a problem, does not mean that the database only just became corrupt.

The only way to check everything (at least, the developers hope it’s everything) is "PRAGMA integrity_check".

Note: the above is simplified for the sake of simplicity.  SQLite does a certain amount of internal checking.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Rowan Worth-2
In reply to this post by Simon Slavin-3
On 28 December 2017 at 02:55, Simon Slavin <[hidden email]> wrote:

> On 27 Dec 2017, at 6:10pm, Nikhil Deshpande <[hidden email]> wrote:
>
> >> Can you include a "pragma integrity_check" at startup ?
> >> Can you include a "pragma integrity_check" executed at regular
> intervals ?
> > The writer process does "pragma quick_check" on every startup at init,
> > bails out on failure and spawns a separate thread to do same
> > "pragma quick_check" every 5 minutes (opens it's own separate DB handle
> > and closes it). Would changing quick_check to integrity_check be
> > helpful? (Would integrity_check catch such corruption earlier than
> > quick_check? Would it hold longer exclusive locks on the DB file
> > that could prevent reads?)
>
> Changing "quick_check" to "integrity_check" may help in this case may be
> useful in investigating your problem.  "integrity_check" is far more
> thorough and checks every aspect of data integrity from both directions.
> However, the type of corruption you’re experiencing may be one which is
> spotted just as well by "quick_check".  There’s no way to know without
> checking it.
>

I haven't checked the code but the docs say:

https://sqlite.org/pragma.html#pragma_quick_check

> The pragma is like integrity_check
<https://sqlite.org/pragma.html#pragma_integrity_check> except that it does
not verify UNIQUE constraints and does
> not verify that index content matches table content. By skipping UNIQUE
and index consistency checks,
> quick_check is able to run much faster than integrity_check. Otherwise
the two pragmas are the same.

Based on my personal experience/testing, quick_check still reads the entire
DB. The only type of corruption I've ever seen integrity_check report which
quick_check didn't was an index lacking an entry for a particular ROWID (in
an otherwise well-formed DB). Coupled with Nikhil's tests I'm certain
quick_check will catch this type of corruption.

So the only scenario left to worry about is an index corruption (ie. one
only detected by integrity_check) happening first, which somehow leads to a
DB page corruption AND masks the original index corruption. Doesn't seem
likely, but switching to integrity_check would rule out any chance.

It does take longer than quick_check, but how much longer will depend on
how big your indices are. On a 2GB DB I've measured 200 seconds for
quick_check vs 300 seconds for integrity_check, but that was over NFS.

I don't think either check takes an EXCLUSIVE lock on the DB? But a
competing writer might, which ends up locking out readers until the check
finishes (and the write completes).


I agree with Simon you're not doing anything obviously wrong... This is
listed in the how to corrupt page, but I'll ask anyway because it's a
subtle one: do any of your processes open the database file, for any
reason, without going through sqlite's API?

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Btree page corruption

Simon Slavin-3


On 29 Dec 2017, at 4:10am, Rowan Worth <[hidden email]> wrote:

> do any of your processes open the database file, for any
> reason, without going through sqlite's API?

Just to note that a major offender in this respect is anti-virus software.  So don’t think just of things that might want to read a SQLite database, but think of anything that might read all files, or changed files.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users