Database corruption, and PRAGMA fullfsync on macOS

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Database corruption, and PRAGMA fullfsync on macOS

Jens Alfke-2
I’m seeing conflicting information about SQLite’s use of F_FULLFSYNC on macOS when committing transactions. This is making me nervous about durability and the possibility of database corruption.

The SQLite docs for PRAGMA fullfsync (https://www.sqlite.org/pragma.html#pragma_fullfsync <https://www.sqlite.org/pragma.html#pragma_fullfsync>) say:
>  This flag determines whether or not the F_FULLFSYNC syncing method is used on systems that support it. The default value of the fullfsync flag is off. Only Mac OS X supports F_FULLFSYNC.

I verified in the built-in sqlite3 tool on macOS 10.12.1 that the result of `pragma fullfsync` is 0.

My understanding is that issuing a F_FULLFSYNC ioctl call is important for ensuring that all writes have been flushed to physical storage, since disk controllers may lie and ignore a regular sync instruction, leaving some writes in their volatile cache memory. (I was working at Apple, on a project using SQLite, during the development of OS X 10.4 circa 2006; there were a number of databases that got corrupted by kernel panics or forced power-off, until Apple had the F_FULLFSYNC call added to SQLite.)

Apple’s documentation for Core Data (which uses SQLite) says: (https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/CoreData/PersistentStoreFeatures.html <https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/CoreData/PersistentStoreFeatures.html>)
> In OS X the fsync command does not guarantee that bytes are written, so SQLite sends a F_FULLFSYNC request to the kernel to ensure that the bytes are actually written through to the drive platter. This request causes the kernel to flush all buffers to the drives and causes the drives to flush their track caches. Without this, there is a significantly large window of time within which data will reside in volatile memory. If system failure occurs you risk data corruption.


This contradicts the SQLite docs, but it does match my understanding, and until today I thought that SQLite on macOS enabled fullfsync by default.

So which of these is true?
• F_FULLFSYNC is no longer necessary for full durability (due to some change in the way SQLite commits?)
or
• SQLite by default sacrifices durability and data integrity for performance by skipping the [slow] F_FULLFSYNC call

This is disturbing because we do have one developer using our framework who has been reporting occasional database corruption. He says it’s happened to users, and he claims to be able to make it happen by forcibly powering-down his laptop (though we haven’t been able to reproduce it.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3

On 15 Nov 2016, at 6:11pm, Jens Alfke <[hidden email]> wrote:

> I verified in the built-in sqlite3 tool on macOS 10.12.1 that the result of `pragma fullfsync` is 0.

The default setting is 0.  But you can change it.  On my Mac running 10.12.x,

SQLite version 3.14.0 2016-07-26 15:17:14
Enter ".help" for usage hints.
sqlite> PRAGMA fullfsync;
0
sqlite> PRAGMA fullfsync=1;
sqlite> PRAGMA fullfsync;
1
sqlite> PRAGMA checkpoint_fullfsync;
1
sqlite>

My understanding is that F_FULLFSYNC still works the way you describe on a Mac and SQLite still uses it the way the documentation says.  But I'm not in touch with either development group.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Jens Alfke-2

> On Nov 15, 2016, at 10:57 AM, Simon Slavin <[hidden email]> wrote:
>
> My understanding is that F_FULLFSYNC still works the way you describe on a Mac and SQLite still uses it the way the documentation says.  But I'm not in touch with either development group.


This seems like a strange decision, considering that in other respects SQLite (wisely) defaults to settings that favor safety and durability over performance, e.g. synchronous mode.

So perhaps the “How To Corrupt A SQLite Database” page should be updated to add “Use SQLite on a Mac without knowing the secret that you should enable `pragma fullfsync`?” :-(


Incidentally, that page (https://www.sqlite.org/howtocorrupt.html <https://www.sqlite.org/howtocorrupt.html>) says:

>> Actually, if one is only concerned with atomic and consistent writes and is willing to forego durable writes, the sync operation does not need to wait until the content is completely stored on persistent media. Instead, the sync operation can be thought of as an I/O barrier. As long as all writes that occur before the sync are completed before any write that happens after the sync, no database corruption will occur.

So … based on my understanding (what I was told by a filesystem guru, albeit 10 years ago), a typical fsync does _not_ act as an I/O barrier. This is because it’s not really syncing (as the same page says, "most consumer-grade mass storage devices lie about syncing”). Disk controllers don’t write physical sectors in the order in which the writes are received; instead they buffer them, and then optimize the order of writes to minimize seeks. So if power is abruptly lost, it is not true that there’s a barrier before which all writes have been persisted and after which none have; instead the pattern is likely to be random.

The only way to guarantee a true barrier is to really-and-truly flush the disk controller, which requires not simply flushing but resetting it. That’s what F_FULLFSYNC on macOS does. (Unfortunately it makes the disk controller unresponsive for a macroscopic amount of time while it resets, which is why it’s slow. Worse, it adds latency to _all_ I/O on the system. Flush too often on a desktop computer and your music player will glitch, video playback will drop frames, etc.)

I’m totally willing to believe that these behaviors are different with SSDs, especially ones that use newer types of interfaces or are soldered directly to the motherboard (as on some MacBooks.) I’d love to be educated about the current state of the art :)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Jens Alfke-2
In reply to this post by Simon Slavin-3

> On Nov 15, 2016, at 10:57 AM, Simon Slavin <[hidden email]> wrote:
>
> sqlite> PRAGMA checkpoint_fullfsync;
> 1

I wasn’t aware of that pragma. Just tried it on my Mac (10.12.1), and its value is 1 even if I don’t first set pragma fullfsync; i.e. it defaults to 1. (Contradicting the docs.)

So it appears that on macOS, SQLite does use F_FULLFSYNC when checkpointing, but not at other times that it fsyncs. What does that mean in actual use, assuming that I always use WAL mode? Is there still an opportunity for corruption in the face of power failures?

(Sorry to be frothing at the mouth about this; but my team’s dealing with a few users/customers whose apps encounter db corruption, on Android as well as macOS, and we’re getting really frustrated trying to figure out what’s going on.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3

On 15 Nov 2016, at 8:34pm, Jens Alfke <[hidden email]> wrote:

> On Nov 15, 2016, at 10:57 AM, Simon Slavin <[hidden email]> wrote:
>
>> sqlite> PRAGMA checkpoint_fullfsync;
>> 1
>
> I wasn’t aware of that pragma. Just tried it on my Mac (10.12.1), and its value is 1 even if I don’t first set pragma fullfsync; i.e. it defaults to 1. (Contradicting the docs.)
>
> So it appears that on macOS, SQLite does use F_FULLFSYNC when checkpointing, but not at other times that it fsyncs.

Yet the sequence I posted was from my own Macintosh, running 10.12.[something I can't talk about].  And as you see from my post the pragma starts off as 0 but if you set it to 1 it'll stay at 1.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3
In reply to this post by Jens Alfke-2

On 15 Nov 2016, at 8:18pm, Jens Alfke <[hidden email]> wrote:

> The only way to guarantee a true barrier is to really-and-truly flush the disk controller, which requires not simply flushing but resetting it. That’s what F_FULLFSYNC on macOS does. (Unfortunately it makes the disk controller unresponsive for a macroscopic amount of time while it resets, which is why it’s slow. Worse, it adds latency to _all_ I/O on the system. Flush too often on a desktop computer and your music player will glitch, video playback will drop frames, etc.)

It's worse than that.

Modern storage subsystems (hard disk or SSD) intended for use in a normal user computer always lie to the OS about flushing to disk.  The apparent increase in speed from doing this is so big that every manufacturer has to do it, or risk having every review harp on about how slow their computer performs in real life tasks.  There is no way to get these things to be honest.  Because not one person in ten thousand cares.  Most people are more interested in how fast they can launch Microsoft Word.

So if you're truly worried about flush-to-disk what do you do ?  Solution 1 is to buy hard disks rated for servers -- sometimes called "enterprise-class hard drives" -- and to set the DIP switches to tell them they're being used on a server.  Those things are intended for use in servers, and ACID can be important to a server, so they support it properly and do not lie.

Solution 2 works on any hard drive.  To make it flush cache just don't do anything that involves disk access for a couple of seconds.  They note the inactivity, then decide they have time to write away cache.

But if all the contact you have with the hard disk is through the OS ?  Forget it.  If power to the computer is but you're probably depending on the amount of power left in the system to flush data to disk.  And given the tiny power consumption those things have these days most of the time that's enough.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3
In reply to this post by Jens Alfke-2

On 15 Nov 2016, at 8:34pm, Jens Alfke <[hidden email]> wrote:

> (Sorry to be frothing at the mouth about this; but my team’s dealing with a few users/customers whose apps encounter db corruption, on Android as well as macOS, and we’re getting really frustrated trying to figure out what’s going on.)

Do these people admit they're letting their phones run out of power ?  If not, I suspect some sort of programming error somewhere.  Not necessarily in your own programming but sometimes in whatever database API you're using which eventually results in calls to the SQLite API.

So is your software in C or C++ and calling the SQLite API directly ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Bob Friesenhahn
In reply to this post by Simon Slavin-3
On Tue, 15 Nov 2016, Simon Slavin wrote:
>
> Modern storage subsystems (hard disk or SSD) intended for use in a
> normal user computer always lie to the OS about flushing to disk.
> The apparent increase in speed from doing this is so big that every
> manufacturer has to do it, or risk having every review harp on about
> how slow their computer performs in real life tasks.  There is no
> way to get these things to be honest.  Because not one person in ten
> thousand cares.  Most people are more interested in how fast they
> can launch Microsoft Word.

I don't think that things are as bad as you say.  Some modern
filesystems (e.g. zfs) depend on hardware cache flush to work yet
there has not been a rash of corrupted filesystems.  Many people use
these filesystems on non-enterprise hardware.

There are some devices which do fail to flush their cache or write
data properly.  Some SSDs likely re-write data while in use or idle
due to wear leveling in a way which causes a possibility of loss.

Enterprise disks are more tolerant of vibration, are specified to
have fewer uncorrected bit errors, and try for a bounded time to
recover bad sectors.

MacOS's target market is not storage.  The useful mass storage
offerings for hardware running MacOS is rather limited.

Operating systems where fsync() or fdatasync() do not truely commit
data to hardware are broken.

Bob
--
Bob Friesenhahn
[hidden email], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Jean-Christophe Deschamps-3
In reply to this post by Simon Slavin-3

At 22:41 15/11/2016, you wrote:

>So if you're truly worried about flush-to-disk what do you do
>?  Solution 1 is to buy hard disks rated for servers -- sometimes
>called "enterprise-class hard drives" -- and to set the DIP switches
>to tell them they're being used on a server.  Those things are
>intended for use in servers, and ACID can be important to a server, so
>they support it properly and do not lie.

It's even both best and worse than that.

I enjoy running an old and ugly diy machine with 8 SAS HP disks (15k
rpm) arranged in RAID6 behind a serious hardware controler having a
good share of battery-backed RAM. Those enterprise-class disks don't
have any switch and will lie to you and the OS as much as they can if
you let them do and don't ask for acknowledgement that each write is
final, but the key piece is the Areca controler which actually manages
the RAID array and knows when/if some write has actually landed on
surface. Should the power supply vanish, it keeps a track of what still
needs to be written and will silently update the disks when power comes
back, even before the OS is loaded.

So no, even very expensive hard disks themselves don't take any step to
avoid lying to you/OS if you/OS don't care over "details", but the
controler (if a serious one) will fill the gap and insure that
everything is fine.

To be honest I'm almost certain that there can exist extreme situations
where, for instance, the RAID array is in tatters (e.g. more than 2 of
6 disks simultaneously failing) and won't accept writes while the
controler battery is going dangerously low.
But if your needs are thusly mission critical you probably can afford
enough redundant sophisticated hardware to make the probability of a
disaster (e.g. a one-bit flip) as low as required.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

R Smith
In reply to this post by Jens Alfke-2

On 2016/11/15 10:34 PM, Jens Alfke wrote:

>
>> On Nov 15, 2016, at 10:57 AM, Simon Slavin <[hidden email]> wrote:
>>
>> sqlite> PRAGMA checkpoint_fullfsync;
>> 1
> I wasn’t aware of that pragma. Just tried it on my Mac (10.12.1), and its value is 1 even if I don’t first set pragma fullfsync; i.e. it defaults to 1. (Contradicting the docs.)
>
> So it appears that on macOS, SQLite does use F_FULLFSYNC when checkpointing, but not at other times that it fsyncs. What does that mean in actual use, assuming that I always use WAL mode? Is there still an opportunity for corruption in the face of power failures?
>
> (Sorry to be frothing at the mouth about this; but my team’s dealing with a few users/customers whose apps encounter db corruption, on Android as well as macOS, and we’re getting really frustrated trying to figure out what’s going on.)

Quite OK to be unsettled by learning that a flaw in the system that you
assumed did not exist, might be the cause of your troubles. I think
however something is missing in the complete understanding, so to be clear:

Calling F_FULLFSYNC when checkpointing or otherwise invokes a contract
between the running software (your system) and the Operating System
whereby the Operating system promises to A - put the current buffer's
worth of written data INTO the BUS feeding the writable media, AND B -
then asking said media to confirm the writing has happened (committed)  
BEFORE handing back control (moving your thread pointer along). This is
not exactly the same for all OSes, but more or less similar.

This can slow down things, but sometimes the security is worth the price
paid in latency. However, Hard drive manufacturers tend to sometimes lie
about having committed your data. It is a well known and almost
universally used method in standard desktop / laptop computers for the
harddrive to tell the OS that: "YES indeed, I have committed" when in
fact it is still piping data into the platters. Yes SSD's are better at
this by simple virtue of lower latency from buffer-to-silicone, but they
are not above lying either.

This means that unless you have a SERVER quality drive with typically
its own battery-backup that guarantees ANY buffered writes to reach the
platters, there simply is zero guarantee that all writes WILL go to
disk, and any normal system that guarantees it lies.

This does not mean however that you should be experiencing corruption.
SQLite might not be able to guarantee all writes reaching the disk, but
in most cases, if a final write did not happen, the usual last step in
committing a transaction is deleting / truncating a journal file or
writing a checkpoint marker or such, which, if it did not happen, should
have the entire write roll back (next time you open the DB) and leave
you in a non-corrupt state. IF this does not happen it means a write may
have happened out of order (not very common but can happen) or some
other worse problem occurred - most importantly, FULL_FSYNC isn't the
wild goose to be chasing. Whether or not any write happened is never an
acceptable cause of corruption, so trying to wrestle with the thing that
promises to make writes happen "more" as a causal relation to a
corruption problem, is simply moot. (This is vigorously tested with
every release of SQLite too).

If you can get the DB files (journals and all) from such a system where
a user claims to be able to reproduce the corruption reliably, that
would be an easy thing to check and the Devs here might learn something
from it. You can simply make something that copy all the DB files before
opening them at startup, until you have produced a corrupt DB, then
those last copied files will be the corrupted DB files that can be
investigated.

Good luck!
Ryan

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Jens Alfke-2
In reply to this post by Simon Slavin-3

> On Nov 15, 2016, at 1:46 PM, Simon Slavin <[hidden email]> wrote:
>
> Do these people admit they're letting their phones run out of power ?  

There’s nothing wrong with letting your phone run out of power, and software should be resilient to it. I don’t think that’s the problem, though, because mobile OS’s perform a clean shutdown when power drops below a minimum threshold, so it’s not like pulling the plug on a desktop computer (or holding down the power button on a laptop because the kernel froze.)

> If not, I suspect some sort of programming error somewhere.  Not necessarily in your own programming but sometimes in whatever database API you're using which eventually results in calls to the SQLite API.
> So is your software in C or C++ and calling the SQLite API directly ?

It’s possible. Our Android library is written in Java and uses some Java wrapper library around SQLite. I don’t know the details myself.

The case I’m concerned about happens on Macs, as I said, and the developer says he can trigger it in his app via a power failure or a forced shutdown (holding the power button for 5 seconds.) This version of our library is one I wrote, in Objective-C; it uses a thin Obj-C wrapper around SQLite, but it doesn’t do anything magic, and although it started out as 3rd party code I’ve tweaked it enough that I know it inside and out.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3

On 16 Nov 2016, at 7:25am, Jens Alfke <[hidden email]> wrote:

> The case I’m concerned about happens on Macs, as I said, and the developer says he can trigger it in his app via a power failure or a forced shutdown (holding the power button for 5 seconds.) This version of our library is one I wrote, in Objective-C; it uses a thin Obj-C wrapper around SQLite, but it doesn’t do anything magic, and although it started out as 3rd party code I’ve tweaked it enough that I know it inside and out.

Ah, that's enough information that someone around here should be able to help.  Can you use the shell tool to run

PRAGMA integrity_check

on one such corrupt file and post what it finds ?  If the output is long, posting a summary is fine.

There's also a tool from the SQLite team which can analyze a corrupt SQLite file and tell some things about how it is corrupt.  Unfortunately I can't remember what it's called or where to find it.  But I heard about it on this list and I hope someone can.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Richard Hipp-3
On 11/16/16, Simon Slavin <[hidden email]> wrote:
>
> There's also a tool from the SQLite team which can analyze a corrupt SQLite
> file and tell some things about how it is corrupt.  Unfortunately I can't
> remember what it's called or where to find it.  But I heard about it on this
> list and I hope someone can.
>

make showdb

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
This post has NOT been accepted by the mailing list yet.
Hi,

I know this is an old thread, but I just found it now when I was doing some research on this topic. Thanks Jens for starting this thread. When Jens said he knew a developer who could create a corrupted database by turning off the power, I'm pretty sure he was talking about me. This has been an ongoing problem for me for a while now. My customers keep getting corrupted SQLite databases due to power failures or forced shutdowns.

I wrote a small sample app to demonstrate the problem just to make sure it wasn't my own app causing the problem in some complex way.

Someone on this thread said to post a sample of a corrupted database. So here's a link to download a file that has become corrupted:

https://www.dropbox.com/s/5xwsfhorrfvefjy/corrupted-sqlite-powerfailure-mac.zip?dl=0

The way it started out was an SQLite file with a table in it called docs (along with other tables) with 408 rows in it. I imported a bunch of data into the SQLite file so that there should be 4402 rows in the docs table, along with data stored in other tables.

After the import and the data was committed to the database (while the app was still launched but not doing anything), I shutdown my MacBook Pro forcibly by holding the power key down until it shut off. When I restarted and opened the database file using the Mac SQLiteManager app, The docs table now contained only 2631 rows, with a lot of NULL data displayed in the docs table. That should not be possible with the library I'm using.

If you're interested in trying out the sample application I wrote to see the corruption for yourself, you can download it here:

https://www.dropbox.com/s/q2r4bz7n1d5fgag/PowerFailureTest.zip?dl=0

You'll need a Mac and Xcode 8.x to run it. I would just have only Xcode and the sample app running when I caused the Mac's power to be killed.

When the database is opened, it's using WAL journal_mode and also has pragma fullfsync=1 enabled.

Any ideas of settings I could try to resolve this problem would be greatly appreciated by me and my customers.

Thanks,

Brendan Duddridge
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

LincolnBurrows
This post has NOT been accepted by the mailing list yet.
In reply to this post by Jens Alfke-2
The easiest and most reliable way is to restore the database file from the backup or you can use SQLite Recovery Tool to recover database from corruption.
Loading...