iOS Watchdog and database corruption

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

iOS Watchdog and database corruption

Deon Brewis
Interesting case of database corruption on iOS here.

Our main thread was waiting for a worker thread to exit. The worker thread was doing a sqlite3Close, which in turn did a checkpoint.

The application got watchdog terminated by iOS because the main thread was taking too long (waiting for the sqlite3close on the worker thread). The resultant force close seems to have aborted SQLITE in such a way that it caused the database to be corrupted.

Worker thread stack:
Thread 4:
    ftruncate: external code (libsystem_kernel.dylib)
    unixTruncate: sqlite3.c @ 34036
    sqlite3WalCheckpoint: sqlite3.c @ 56846
    sqlite3WalClose: sqlite3.c @ 56955
    sqlite3PagerClose: sqlite3.c @ 51556
    sqlite3BtreeClose: sqlite3.c @ 62169
    sqlite3LeaveMutexAndCloseZombie: sqlite3.c @ 142752
    sqlite3Close: sqlite3.c @ 0

SQLITE3 version is 3.20.1. Database size is around 5 GB.

Couple of questions:

a) Is it expected that an app crash / force terminate in the middle of a SQLITE3 checkpoint like this can cause corruption?

b) Is there a way I can do a close without triggering a checkpoint? (In order to speed up close, so that it doesn't trigger a watchdog).

- Deon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Richard Hipp-3
On 2/21/18, Deon Brewis <[hidden email]> wrote:
>
> a) Is it expected that an app crash / force terminate in the middle of a
> SQLITE3 checkpoint like this can cause corruption?

No.  See, for example, https://www.sqlite.org/atomiccommit.html and
https://www.sqlite.org/wal.html and .  If the filesystem is behaving
properly, and assuming no other application tries to "clean up" after
a app crash, then the database will be automatically restored to a
consistent state the next time it is opened.  This is extensively
tested.

Usually issues like this come back to either filesystem bugs or the
watchdog, or some other component, going in an deleting the -wal file
in an effort to be helpful and "clean up" after the application crash,
and thereby deleting information that SQLite needs to recover,
resulting in a corrupt database.

Other ways in which the database file can go corrupt:
https://www.sqlite.org/howtocorrupt.html

>
> b) Is there a way I can do a close without triggering a checkpoint? (In
> order to speed up close, so that it doesn't trigger a watchdog).
>

Set the SQLITE_DBCONFIG_NO_CKPT_ON_CLOSE option.
https://www.sqlite.org/c3ref/c_dbconfig_enable_fkey.html

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Simon Slavin-3
In reply to this post by Deon Brewis
On 21 Feb 2018, at 2:35pm, Deon Brewis <[hidden email]> wrote:

> The application got watchdog terminated by iOS because the main thread was taking too long (waiting for the sqlite3close on the worker thread). The resultant force close seems to have aborted SQLITE in such a way that it caused the database to be corrupted.
>
> Worker thread stack:
> Thread 4:
>    ftruncate: external code (libsystem_kernel.dylib)
>    unixTruncate: sqlite3.c @ 34036
>    sqlite3WalCheckpoint: sqlite3.c @ 56846
>    sqlite3WalClose: sqlite3.c @ 56955
>    sqlite3PagerClose: sqlite3.c @ 51556
>    sqlite3BtreeClose: sqlite3.c @ 62169
>    sqlite3LeaveMutexAndCloseZombie: sqlite3.c @ 142752
>    sqlite3Close: sqlite3.c @ 0

I'm puzzled by this.  iOS gives applications quite a long time to terminate before calling "kill" on them.  Had "applicationWillTerminate" been called ?  Was it definitely your main thread (via thread 4) which was delaying the termination, and not another thread ?  You should find the offending thread identified further up in that same report, just before it starts listing the call-stacks of each thread.

If the offending call really was "sqlite3LeaveMutexAndCloseZombie" then you may have some sort of mismanagement in your code.  Or it might be just a once-in-a-blue-moon problem which will never occur again.

The rest I lave up to the devs.  SQLite should not be corrupting a database just because it was unexpectedly terminated, no matter what it was doing when terminated.  It was written to avoid that and no amount of testing has shown such a bug.

> SQLITE3 version is 3.20.1. Database size is around 5 GB.

You have a 5 GB database on a device which may have a 16 GB capacity ?  I assume you know what you're doing.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Deon Brewis
Yes, definitely the main thread - we close down the database during applicationWillTerminate. It gives us 5 seconds to exit before it triggers the watchdog.

Termination Description: SPRINGBOARD, process-exit watchdog transgression: xxx exhausted real (wall clock) time allowance of 5.00 seconds |  | ProcessVisibility: Foreground | ProcessState: Running | WatchdogEvent: process-exit | WatchdogVisibility: Foreground | WatchdogCPUStatistics: ( | "Elapsed total CPU time (seconds): 3.740 (user 3.740, system 0.000), 25% CPU", | "Elapsed application CPU time (seconds): 0.049, 0% CPU" | )

Triggered by Thread:  0

Thread 0 crashed:
    __semwait_signal: external code (libsystem_kernel.dylib)
    nanosleep: external code (libsystem_c.dylib)
    +[NSThread sleepForTimeInterval:]: external code (Foundation)
    Database::signalCloseAndWait()
    App::~App()
    -[AppDelegate applicationWillTerminate:]: appdelegate.mm @ 377
    -[UIApplication _terminateWithStatus:]: external code (UIKit)


"If the offending call really was "sqlite3LeaveMutexAndCloseZombie" then you may have some sort of mismanagement in your code"

What do you mean by that? Is it abnormal for sqlite3close to call sqlite3LeaveMutexAndCloseZombie?

- Deon

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Wednesday, February 21, 2018 7:23 AM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] iOS Watchdog and database corruption

On 21 Feb 2018, at 2:35pm, Deon Brewis <[hidden email]> wrote:

> The application got watchdog terminated by iOS because the main thread was taking too long (waiting for the sqlite3close on the worker thread). The resultant force close seems to have aborted SQLITE in such a way that it caused the database to be corrupted.
>
> Worker thread stack:
> Thread 4:
>    ftruncate: external code (libsystem_kernel.dylib)
>    unixTruncate: sqlite3.c @ 34036
>    sqlite3WalCheckpoint: sqlite3.c @ 56846
>    sqlite3WalClose: sqlite3.c @ 56955
>    sqlite3PagerClose: sqlite3.c @ 51556
>    sqlite3BtreeClose: sqlite3.c @ 62169
>    sqlite3LeaveMutexAndCloseZombie: sqlite3.c @ 142752
>    sqlite3Close: sqlite3.c @ 0

I'm puzzled by this.  iOS gives applications quite a long time to terminate before calling "kill" on them.  Had "applicationWillTerminate" been called ?  Was it definitely your main thread (via thread 4) which was delaying the termination, and not another thread ?  You should find the offending thread identified further up in that same report, just before it starts listing the call-stacks of each thread.

If the offending call really was "sqlite3LeaveMutexAndCloseZombie" then you may have some sort of mismanagement in your code.  Or it might be just a once-in-a-blue-moon problem which will never occur again.

The rest I lave up to the devs.  SQLite should not be corrupting a database just because it was unexpectedly terminated, no matter what it was doing when terminated.  It was written to avoid that and no amount of testing has shown such a bug.

> SQLITE3 version is 3.20.1. Database size is around 5 GB.

You have a 5 GB database on a device which may have a 16 GB capacity ?  I assume you know what you're doing.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7C%7C84050f4c5810477347f708d5793effe3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636548233837936119&sdata=ALIGynOvAu4HWcRE3wlBELXyEjC39PDXTYJDSNnJiqc%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Stephen Chrzanowski
In reply to this post by Simon Slavin-3
That number doesn't surprise me.  At my company, one of our products is
built around iPads.  Airlines give their pilots 16-32GB iPads to bring into
the cockpit to look at maps, charts, weather info, etc.  The iPads
essentially become EFB, or, Electronic Flight Bags.  Compressed, we push
two or three gig of PDFs, images, and proprietary information of different
format structures, probably some of which is SQLite, to the devices on
initial deployment, and then update packages data going forward.  Once
received, those packages are decompressed and put into place.

A single 5gig database?  Not a big deal if the device is being used for a
very specific purpose.

On Wed, Feb 21, 2018 at 10:22 AM, Simon Slavin <[hidden email]> wrote:

>
> > SQLITE3 version is 3.20.1. Database size is around 5 GB.
>
> You have a 5 GB database on a device which may have a 16 GB capacity ?  I
> assume you know what you're doing.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Simon Slavin-3
In reply to this post by Deon Brewis
On 21 Feb 2018, at 3:34pm, Deon Brewis <[hidden email]> wrote:

> Yes, definitely the main thread - we close down the database during applicationWillTerminate. It gives us 5 seconds to exit before it triggers the watchdog.

Okay, that all sounds right, and the dump you pasted suggested everything worked right.  I know a lot about the iOS application cycle, somewhat less about SQLite.

>> "If the offending call really was "sqlite3LeaveMutexAndCloseZombie" then you may have some sort of mismanagement in your code"
>
> What do you mean by that? Is it abnormal for sqlite3close to call sqlite3LeaveMutexAndCloseZombie?

Sorry, I didn't mean it like that.  My concern was that it's abnormal for "sqlite3LeaveMutexAndCloseZombie" to take five seconds to execute.  This is rare, and suggested that perhaps some other part of your application (maybe your own code, maybe SQLite code) had abandoned the mutex.  But it seems you're doing everything right and the 5 second delay mystifies me.

DRH says that a crash of any sort should not be corrupting the database.  If you can reliably demonstrate it happening, I'm sure it's something he'd like to investigate.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Jens Alfke-2


> On Feb 21, 2018, at 7:45 AM, Simon Slavin <[hidden email]> wrote:
>
> My concern was that it's abnormal for "sqlite3LeaveMutexAndCloseZombie" to take five seconds to execute.

As of a few weeks ago, I know all about this function ;-) It's called when the last statement is closed on a "zombie" database connection that's already had sqlite3_close_v2 called on it; it performs the actual close that was deferred. It's taking a long time because it's calling sqlite3WalCheckpoint.

But it is scary that the database file got corrupted. Deon, do you still have the corrupted file(s) available for forensics?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Deon Brewis
I do.

I'll have to request permission from the customer though to share it - who will potentially be looking at the file? (Just so I can share names and background with the customer to put him at ease).

- Deon

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Jens Alfke
Sent: Wednesday, February 21, 2018 9:19 AM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] iOS Watchdog and database corruption



> On Feb 21, 2018, at 7:45 AM, Simon Slavin <[hidden email]> wrote:
>
> My concern was that it's abnormal for "sqlite3LeaveMutexAndCloseZombie" to take five seconds to execute.

As of a few weeks ago, I know all about this function ;-) It's called when the last statement is closed on a "zombie" database connection that's already had sqlite3_close_v2 called on it; it performs the actual close that was deferred. It's taking a long time because it's calling sqlite3WalCheckpoint.

But it is scary that the database file got corrupted. Deon, do you still have the corrupted file(s) available for forensics?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=02%7C01%7C%7C257363c1f5294b83d4c408d5794f3e3c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636548303605879185&sdata=91ZUzky2NXRMbmdZ70MnTnW%2FT4crgDMfNTwTXPiDsCg%3D&reserved=0
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: iOS Watchdog and database corruption

Richard Hipp-3
On 2/21/18, Deon Brewis <[hidden email]> wrote:
> I do.
>
> I'll have to request permission from the customer though to share it - who
> will potentially be looking at the file? (Just so I can share names and
> background with the customer to put him at ease).

You can send corrupt database files (and corresponding journals)
directly to my private email and they will be shared only among the
SQLite developers: me, Dan, and Joe.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users