Quantcast

Re: Database corruption, and PRAGMA fullfsync on macOS

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
Hi,

I know this is an old thread, but I just found it now when I was doing some
research on this topic. Thanks Jens for starting this thread. When Jens
said he knew a developer who could create a corrupted database by turning
off the power, I'm pretty sure he was talking about me. This has been an
ongoing problem for me for a while now. My customers keep getting corrupted
SQLite databases due to power failures or forced shutdowns.

I wrote a small sample app to demonstrate the problem just to make sure it
wasn't my own app causing the problem in some complex way.

Someone on this thread said to post a sample of a corrupted database. So
here's a link to download a file that has become corrupted:

https://www.dropbox.com/s/5xwsfhorrfvefjy/corrupted-sqlite-powerfailure-mac.zip?dl=0

The way it started out was an SQLite file with a table in it called docs
(along with other tables) with 408 rows in it. I imported a bunch of data
into the SQLite file so that there should be 4402 rows in the docs table,
along with data stored in other tables.

After the import and the data was committed to the database (while the app
was still launched but not doing anything), I shutdown my MacBook Pro
forcibly by holding the power key down until it shut off. When I restarted
and opened the database file using the Mac SQLiteManager app, The docs
table now contained only 2631 rows, with a lot of NULL data displayed in
the docs table. That should not be possible with the library I'm using.

If you're interested in trying out the sample application I wrote to see
the corruption for yourself, you can download it here:

https://www.dropbox.com/s/q2r4bz7n1d5fgag/PowerFailureTest.zip?dl=0

You'll need a Mac and Xcode 8.x to run it. I would just have only Xcode and
the sample app running when I caused the Mac's power to be killed.

When the database is opened, it's using WAL journal_mode and also has
pragma fullfsync=1 enabled.

Any ideas of settings I could try to resolve this problem would be greatly
appreciated by me and my customers.

Thanks,

Brendan Duddridge
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Richard Hipp-3
I cannot seem to find the implementation for COLLATE JSON anywhere in
your source code.  Can you give me a hint as to which source file I
should be looking in?

On 4/16/17, Brendan Duddridge <[hidden email]> wrote:

> Hi,
>
> I know this is an old thread, but I just found it now when I was doing some
> research on this topic. Thanks Jens for starting this thread. When Jens
> said he knew a developer who could create a corrupted database by turning
> off the power, I'm pretty sure he was talking about me. This has been an
> ongoing problem for me for a while now. My customers keep getting corrupted
> SQLite databases due to power failures or forced shutdowns.
>
> I wrote a small sample app to demonstrate the problem just to make sure it
> wasn't my own app causing the problem in some complex way.
>
> Someone on this thread said to post a sample of a corrupted database. So
> here's a link to download a file that has become corrupted:
>
> https://www.dropbox.com/s/5xwsfhorrfvefjy/corrupted-sqlite-powerfailure-mac.zip?dl=0
>
> The way it started out was an SQLite file with a table in it called docs
> (along with other tables) with 408 rows in it. I imported a bunch of data
> into the SQLite file so that there should be 4402 rows in the docs table,
> along with data stored in other tables.
>
> After the import and the data was committed to the database (while the app
> was still launched but not doing anything), I shutdown my MacBook Pro
> forcibly by holding the power key down until it shut off. When I restarted
> and opened the database file using the Mac SQLiteManager app, The docs
> table now contained only 2631 rows, with a lot of NULL data displayed in
> the docs table. That should not be possible with the library I'm using.
>
> If you're interested in trying out the sample application I wrote to see
> the corruption for yourself, you can download it here:
>
> https://www.dropbox.com/s/q2r4bz7n1d5fgag/PowerFailureTest.zip?dl=0
>
> You'll need a Mac and Xcode 8.x to run it. I would just have only Xcode and
> the sample app running when I caused the Mac's power to be killed.
>
> When the database is opened, it's using WAL journal_mode and also has
> pragma fullfsync=1 enabled.
>
> Any ideas of settings I could try to resolve this problem would be greatly
> appreciated by me and my customers.
>
> Thanks,
>
> Brendan Duddridge
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Richard Hipp-3
On 4/16/17, Richard Hipp <[hidden email]> wrote:
> I cannot seem to find the implementation for COLLATE JSON anywhere in
> your source code.  Can you give me a hint as to which source file I
> should be looking in?

I worked around the "COLLATE JSON" problem (by writing my own JSON
collation).  That allows me to analyze your database and WAL file.

It appears that the WAL file is not the correct WAL file for that
database.  It is as if someone has taken an unrelated WAL file and
renamed it to have the same base name as your database.  Or the other
way around - someone has renamed your database to have the same base
name as the WAL file.

Your demonstration application does not call SQLite directly.  Instead
it appears to use two libraries that in turn call SQLite:
libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
sources to these other libraries, so I am unable to deduce what they
are doing.

So, in the absence of further evidence, I am going to diagnose this as
a misuse of SQLite by one of the two libraries that you are linking -
probably a misuse in the form of trying to rename or unlink or
otherwise modify the database file using ordinary operating system
calls while a connection to the database is open.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
In reply to this post by brendand
Thanks Richard for your reply.

Sorry about the COLLATE problem. That's a Couchbase Lite thing.

I find it weird that a different WAL file is getting in there somehow when
a power failure occurs. I'm a bit stumped at how it can even write to the
file system the moment the power shuts down so I don't even understand how
the database file can get corrupted.  Unless the corruption happens when
the database file is opened up next and tries to use this improper WAL file.

In my main project I'm using SQLCipher as my SQLite layer. But in the demo
project I posted, I'm just using the built-in macOS Sierra SQLite library.
In both cases though I can cause SQLite file corruption by cutting the
power on my MacBook Pro.

I'm going to do further investigations to see if there's any misuse of
SQLite that I can find. Although the developers of Couchbase Lite are far
smarter than I and I'm sure they're doing things by the book.

Perhaps the only think I can think of that I'm doing that may be unorthodox
is by storing the SQLite files inside a macOS package. But at the unix
layer that's really just a directory, so I don't know how that could cause
a problem. Unless macOS treats the package in a way differently than a
normal folder and is causing things to get moved around or written when a
power failure occurs.

This problem has been plaguing me for quite a long time actually. I hope
that I can find a solution somehow.

Thanks,

Brendan

> I worked around the "COLLATE JSON" problem (by writing my own JSON
> collation).  That allows me to analyze your database and WAL file.
>
> It appears that the WAL file is not the correct WAL file for that
> database.  It is as if someone has taken an unrelated WAL file and
> renamed it to have the same base name as your database.  Or the other
> way around - someone has renamed your database to have the same base
> name as the WAL file.
>
> Your demonstration application does not call SQLite directly.  Instead
> it appears to use two libraries that in turn call SQLite:
> libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> sources to these other libraries, so I am unable to deduce what they
> are doing.
>
> So, in the absence of further evidence, I am going to diagnose this as
> a misuse of SQLite by one of the two libraries that you are linking -
> probably a misuse in the form of trying to rename or unlink or
> otherwise modify the database file using ordinary operating system
> calls while a connection to the database is open.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Rowan Worth-2
Hey Brendan,

I'm no OSX expert, but from what I've read this afternoon about NSDocument
and friends combined with what I know about sqlite I have to say you are
completely mad to continue passing sqlite databases to NSDocument,
*especially* as you don't define your own sub-class to do any of the file
management.

Relying on some NSDocument sub-class builtin to the system and then also
opening the DB with CouchbaseLite may well violate section 2.2.1 of "how to
corrupt an sqlite database" (multiple copies of sqlite linked into the same
application). Even if not, it sounds like NSDocument has a tendency to copy
files around for eg. auto-save purposes.

I'd be very *very* surprised if there's an sqlite bug here. I'd take the
advice given to you on bountysource and watch your application's file
system operations to begin to understand what is going on behind the scenes.

-Rowan



On 18 April 2017 at 13:01, Brendan Duddridge <[hidden email]> wrote:

> Thanks Richard for your reply.
>
> Sorry about the COLLATE problem. That's a Couchbase Lite thing.
>
> I find it weird that a different WAL file is getting in there somehow when
> a power failure occurs. I'm a bit stumped at how it can even write to the
> file system the moment the power shuts down so I don't even understand how
> the database file can get corrupted.  Unless the corruption happens when
> the database file is opened up next and tries to use this improper WAL
> file.
>
> In my main project I'm using SQLCipher as my SQLite layer. But in the demo
> project I posted, I'm just using the built-in macOS Sierra SQLite library.
> In both cases though I can cause SQLite file corruption by cutting the
> power on my MacBook Pro.
>
> I'm going to do further investigations to see if there's any misuse of
> SQLite that I can find. Although the developers of Couchbase Lite are far
> smarter than I and I'm sure they're doing things by the book.
>
> Perhaps the only think I can think of that I'm doing that may be unorthodox
> is by storing the SQLite files inside a macOS package. But at the unix
> layer that's really just a directory, so I don't know how that could cause
> a problem. Unless macOS treats the package in a way differently than a
> normal folder and is causing things to get moved around or written when a
> power failure occurs.
>
> This problem has been plaguing me for quite a long time actually. I hope
> that I can find a solution somehow.
>
> Thanks,
>
> Brendan
>
> > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > collation).  That allows me to analyze your database and WAL file.
> >
> > It appears that the WAL file is not the correct WAL file for that
> > database.  It is as if someone has taken an unrelated WAL file and
> > renamed it to have the same base name as your database.  Or the other
> > way around - someone has renamed your database to have the same base
> > name as the WAL file.
> >
> > Your demonstration application does not call SQLite directly.  Instead
> > it appears to use two libraries that in turn call SQLite:
> > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > sources to these other libraries, so I am unable to deduce what they
> > are doing.
> >
> > So, in the absence of further evidence, I am going to diagnose this as
> > a misuse of SQLite by one of the two libraries that you are linking -
> > probably a misuse in the form of trying to rename or unlink or
> > otherwise modify the database file using ordinary operating system
> > calls while a connection to the database is open.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Richard Hipp-3
In reply to this post by brendand
On 4/18/17, Brendan Duddridge <[hidden email]> wrote:
> In both cases though I can cause SQLite file corruption by cutting the
> power on my MacBook Pro.

Corruption-by-power-loss problems tend to be very sensitive to timing.
If you are able to reliably generate the problem by leisurely cutting
power (holding down the power button) that suggests something else.
You might be able to create the same effect by issuing a SIGKILL on
the process (kill -9) at the same point that you would normally cut
the power.

I'm thinking you have some problem similar to that which is described
in section 2.4 of https://www.sqlite.org/howtocorrupt.html

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
In reply to this post by Rowan Worth-2
Hey Rowan,

In my Mac app I'm actually not using NSDocument. I used to and then I
changed my document subclass to extend NSObject instead, just in case
NSDocument was doing bad things behind my back. I had also disabled
auto-save and versions anyway. For a while though I tried extending
NSPersistentDocument even though my app isn't using CoreData. But it also
didn't make a difference.

I'm still storing my SQLite database in a file package even without using
NSDocument. The strange thing though is this is never a problem on iOS
where my code extends UIDocument, but for all intents and purposes behaves
pretty much the same way, at least in terms of organization of the database
files. Only macOS seems to have the corruption problem.

On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth <[hidden email]> wrote:

> Hey Brendan,
>
> I'm no OSX expert, but from what I've read this afternoon about NSDocument
> and friends combined with what I know about sqlite I have to say you are
> completely mad to continue passing sqlite databases to NSDocument,
> *especially* as you don't define your own sub-class to do any of the file
> management.
>
> Relying on some NSDocument sub-class builtin to the system and then also
> opening the DB with CouchbaseLite may well violate section 2.2.1 of "how to
> corrupt an sqlite database" (multiple copies of sqlite linked into the same
> application). Even if not, it sounds like NSDocument has a tendency to copy
> files around for eg. auto-save purposes.
>
> I'd be very *very* surprised if there's an sqlite bug here. I'd take the
> advice given to you on bountysource and watch your application's file
> system operations to begin to understand what is going on behind the
> scenes.
>
> -Rowan
>
>
>
> On 18 April 2017 at 13:01, Brendan Duddridge <[hidden email]> wrote:
>
> > Thanks Richard for your reply.
> >
> > Sorry about the COLLATE problem. That's a Couchbase Lite thing.
> >
> > I find it weird that a different WAL file is getting in there somehow
> when
> > a power failure occurs. I'm a bit stumped at how it can even write to the
> > file system the moment the power shuts down so I don't even understand
> how
> > the database file can get corrupted.  Unless the corruption happens when
> > the database file is opened up next and tries to use this improper WAL
> > file.
> >
> > In my main project I'm using SQLCipher as my SQLite layer. But in the
> demo
> > project I posted, I'm just using the built-in macOS Sierra SQLite
> library.
> > In both cases though I can cause SQLite file corruption by cutting the
> > power on my MacBook Pro.
> >
> > I'm going to do further investigations to see if there's any misuse of
> > SQLite that I can find. Although the developers of Couchbase Lite are far
> > smarter than I and I'm sure they're doing things by the book.
> >
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the unix
> > layer that's really just a directory, so I don't know how that could
> cause
> > a problem. Unless macOS treats the package in a way differently than a
> > normal folder and is causing things to get moved around or written when a
> > power failure occurs.
> >
> > This problem has been plaguing me for quite a long time actually. I hope
> > that I can find a solution somehow.
> >
> > Thanks,
> >
> > Brendan
> >
> > > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > > collation).  That allows me to analyze your database and WAL file.
> > >
> > > It appears that the WAL file is not the correct WAL file for that
> > > database.  It is as if someone has taken an unrelated WAL file and
> > > renamed it to have the same base name as your database.  Or the other
> > > way around - someone has renamed your database to have the same base
> > > name as the WAL file.
> > >
> > > Your demonstration application does not call SQLite directly.  Instead
> > > it appears to use two libraries that in turn call SQLite:
> > > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > > sources to these other libraries, so I am unable to deduce what they
> > > are doing.
> > >
> > > So, in the absence of further evidence, I am going to diagnose this as
> > > a misuse of SQLite by one of the two libraries that you are linking -
> > > probably a misuse in the form of trying to rename or unlink or
> > > otherwise modify the database file using ordinary operating system
> > > calls while a connection to the database is open.
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
In reply to this post by Rowan Worth-2
Hi Rowan,

My apologies. My sample app does use NSDocument. But my production app
doesn't. And they both exhibit the same problem.

On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth <[hidden email]> wrote:

> Hey Brendan,
>
> I'm no OSX expert, but from what I've read this afternoon about NSDocument
> and friends combined with what I know about sqlite I have to say you are
> completely mad to continue passing sqlite databases to NSDocument,
> *especially* as you don't define your own sub-class to do any of the file
> management.
>
> Relying on some NSDocument sub-class builtin to the system and then also
> opening the DB with CouchbaseLite may well violate section 2.2.1 of "how to
> corrupt an sqlite database" (multiple copies of sqlite linked into the same
> application). Even if not, it sounds like NSDocument has a tendency to copy
> files around for eg. auto-save purposes.
>
> I'd be very *very* surprised if there's an sqlite bug here. I'd take the
> advice given to you on bountysource and watch your application's file
> system operations to begin to understand what is going on behind the
> scenes.
>
> -Rowan
>
>
>
> On 18 April 2017 at 13:01, Brendan Duddridge <[hidden email]> wrote:
>
> > Thanks Richard for your reply.
> >
> > Sorry about the COLLATE problem. That's a Couchbase Lite thing.
> >
> > I find it weird that a different WAL file is getting in there somehow
> when
> > a power failure occurs. I'm a bit stumped at how it can even write to the
> > file system the moment the power shuts down so I don't even understand
> how
> > the database file can get corrupted.  Unless the corruption happens when
> > the database file is opened up next and tries to use this improper WAL
> > file.
> >
> > In my main project I'm using SQLCipher as my SQLite layer. But in the
> demo
> > project I posted, I'm just using the built-in macOS Sierra SQLite
> library.
> > In both cases though I can cause SQLite file corruption by cutting the
> > power on my MacBook Pro.
> >
> > I'm going to do further investigations to see if there's any misuse of
> > SQLite that I can find. Although the developers of Couchbase Lite are far
> > smarter than I and I'm sure they're doing things by the book.
> >
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the unix
> > layer that's really just a directory, so I don't know how that could
> cause
> > a problem. Unless macOS treats the package in a way differently than a
> > normal folder and is causing things to get moved around or written when a
> > power failure occurs.
> >
> > This problem has been plaguing me for quite a long time actually. I hope
> > that I can find a solution somehow.
> >
> > Thanks,
> >
> > Brendan
> >
> > > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > > collation).  That allows me to analyze your database and WAL file.
> > >
> > > It appears that the WAL file is not the correct WAL file for that
> > > database.  It is as if someone has taken an unrelated WAL file and
> > > renamed it to have the same base name as your database.  Or the other
> > > way around - someone has renamed your database to have the same base
> > > name as the WAL file.
> > >
> > > Your demonstration application does not call SQLite directly.  Instead
> > > it appears to use two libraries that in turn call SQLite:
> > > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > > sources to these other libraries, so I am unable to deduce what they
> > > are doing.
> > >
> > > So, in the absence of further evidence, I am going to diagnose this as
> > > a misuse of SQLite by one of the two libraries that you are linking -
> > > probably a misuse in the form of trying to rename or unlink or
> > > otherwise modify the database file using ordinary operating system
> > > calls while a connection to the database is open.
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Deon Brewis
Are you by change using memory mapped IO (MMAP_SIZE something other than 0)?

This does not work on OSX. Not even remotely. I tracked an issue down in November 2015, and was able to trivially corrupt a database 100% of the time using the repro steps below. This happens long after our app gets shut down and SQLITE is flushed.  It got fixed when I set MMAP_SIZE to 0.

Repro (100%):

1) Run our app
2) Shutdown our app
3) Wait for our app to cleanly shut down – nothing showing in Activity Monitor – and app.db-wal deleted from disk (i.e. SQLITE clean close)
4) Wait 2 minutes (so our app isn’t running in this 2 minute period at all)

5) Copy the our db file to a NAS
6) Hard reboot the machine (power cycle).
7) Copy the db file to the NAS again (no reopening the app, just copy the file back to the NAS after the reboot).

Observe…

The file from #5 still works fine.
The file from #7 is corrupted.

Note that the app or sqlite is nowhere involved in between #5 and #7


I made this note in the bug when I fixed it - I believe it's related, but don't have the exact context:
“From the OSX documentation:

Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence. Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written.  The disk drive may also re-order the data so that later writes may be present, while earlier writes are not.

This is not a theoretical edge case.  This scenario is easily reproduced with real world workloads and drive power failures.”

- Deon

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Brendan Duddridge
Sent: Tuesday, April 18, 2017 1:36 AM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS

Hi Rowan,

My apologies. My sample app does use NSDocument. But my production app doesn't. And they both exhibit the same problem.

On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth <[hidden email]> wrote:

> Hey Brendan,
>
> I'm no OSX expert, but from what I've read this afternoon about
> NSDocument and friends combined with what I know about sqlite I have
> to say you are completely mad to continue passing sqlite databases to
> NSDocument,
> *especially* as you don't define your own sub-class to do any of the
> file management.
>
> Relying on some NSDocument sub-class builtin to the system and then
> also opening the DB with CouchbaseLite may well violate section 2.2.1
> of "how to corrupt an sqlite database" (multiple copies of sqlite
> linked into the same application). Even if not, it sounds like
> NSDocument has a tendency to copy files around for eg. auto-save purposes.
>
> I'd be very *very* surprised if there's an sqlite bug here. I'd take
> the advice given to you on bountysource and watch your application's
> file system operations to begin to understand what is going on behind
> the scenes.
>
> -Rowan
>
>
>
> On 18 April 2017 at 13:01, Brendan Duddridge <[hidden email]> wrote:
>
> > Thanks Richard for your reply.
> >
> > Sorry about the COLLATE problem. That's a Couchbase Lite thing.
> >
> > I find it weird that a different WAL file is getting in there
> > somehow
> when
> > a power failure occurs. I'm a bit stumped at how it can even write
> > to the file system the moment the power shuts down so I don't even
> > understand
> how
> > the database file can get corrupted.  Unless the corruption happens
> > when the database file is opened up next and tries to use this
> > improper WAL file.
> >
> > In my main project I'm using SQLCipher as my SQLite layer. But in
> > the
> demo
> > project I posted, I'm just using the built-in macOS Sierra SQLite
> library.
> > In both cases though I can cause SQLite file corruption by cutting
> > the power on my MacBook Pro.
> >
> > I'm going to do further investigations to see if there's any misuse
> > of SQLite that I can find. Although the developers of Couchbase Lite
> > are far smarter than I and I'm sure they're doing things by the book.
> >
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the
> > unix layer that's really just a directory, so I don't know how that
> > could
> cause
> > a problem. Unless macOS treats the package in a way differently than
> > a normal folder and is causing things to get moved around or written
> > when a power failure occurs.
> >
> > This problem has been plaguing me for quite a long time actually. I
> > hope that I can find a solution somehow.
> >
> > Thanks,
> >
> > Brendan
> >
> > > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > > collation).  That allows me to analyze your database and WAL file.
> > >
> > > It appears that the WAL file is not the correct WAL file for that
> > > database.  It is as if someone has taken an unrelated WAL file and
> > > renamed it to have the same base name as your database.  Or the
> > > other way around - someone has renamed your database to have the
> > > same base name as the WAL file.
> > >
> > > Your demonstration application does not call SQLite directly.  
> > > Instead it appears to use two libraries that in turn call SQLite:
> > > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > > sources to these other libraries, so I am unable to deduce what
> > > they are doing.
> > >
> > > So, in the absence of further evidence, I am going to diagnose
> > > this as a misuse of SQLite by one of the two libraries that you
> > > are linking - probably a misuse in the form of trying to rename or
> > > unlink or otherwise modify the database file using ordinary
> > > operating system calls while a connection to the database is open.
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3
In reply to this post by brendand

On 18 Apr 2017, at 6:01am, Brendan Duddridge <[hidden email]> wrote:

> Perhaps the only think I can think of that I'm doing that may be unorthodox
> is by storing the SQLite files inside a macOS package. But at the unix
> layer that's really just a directory, so I don't know how that could cause
> a problem. Unless macOS treats the package in a way differently than a
> normal folder and is causing things to get moved around or written when a
> power failure occurs.

You should be safe storing a SQLite database inside a package.  As you write, a package is really just a folder with one extra bit set.

Any demonstration code which uses the SQLite API directly we can attempt to debug.

If the problem you need solving is actually with Couchbase, you can solve it by upgrading to any version from 2.0 onwards.  Since that no longer uses SQLite, SQLite database corruption can no longer be a problem.  Current versions of Couchbase no longer use SQLite for a persistence layer and Couchbase developers are unlikely to assist with any problem you find with such an old version.  

If your program demonstrating the problem has access to the SQLite database solely via libsqlcrypt.a, can you explain what you’re using it for ?  Is your database actually direct access to a Lotus Notes file ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
In reply to this post by Richard Hipp-3
Hi Richard,

I just did another test with my PowerFailureTest app. I launched it and the
count of rows in it was 4402. Without even running the function to import
my data that writes to the SQLite file, I cut the power (holding down the
power button on my MBP).

When my Mac finally rebooted, I saw that the WAL file was 0 bytes, the SHM
file was 32768 bytes, and the db.sqlite file was 10.5 MB (as it was before
the test).

But now when I launch the test app, These messages display in the console
log:

*02:42:38.630| WARNING: SQLite error (code 11): database corruption at line
60553 of [2ef4f3a5b1] {at errorLogCallback:125}*

*02:42:38.631| WARNING: SQLite error (code 11): statement aborts at 18:
[SELECT revs.doc_id, docid, revid, sequence FROM revs, docs WHERE
docs.doc_id = revs.doc_id AND current=1 AND deleted=0 ORDER BY docid ASC,
revid DESC LIMIT ? OFFSET ?] database disk i {at errorLogCallback:125}*

*2017-04-18 02:42:38.630791-0600 PowerFailureTest[672:9264] Unknown error
calling sqlite3_step (11: database disk image is malformed) rs*

*2017-04-18 02:42:38.632178-0600 PowerFailureTest[672:9264] doc count: 408*

So there's definitely some corruption going on. The file was fine before I
cut the power.



On Tue, Apr 18, 2017 at 2:23 AM, Richard Hipp <[hidden email]> wrote:

> On 4/18/17, Brendan Duddridge <[hidden email]> wrote:
> > In both cases though I can cause SQLite file corruption by cutting the
> > power on my MacBook Pro.
>
> Corruption-by-power-loss problems tend to be very sensitive to timing.
> If you are able to reliably generate the problem by leisurely cutting
> power (holding down the power button) that suggests something else.
> You might be able to create the same effect by issuing a SIGKILL on
> the process (kill -9) at the same point that you would normally cut
> the power.
>
> I'm thinking you have some problem similar to that which is described
> in section 2.4 of https://www.sqlite.org/howtocorrupt.html
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3
In reply to this post by Deon Brewis

On 18 Apr 2017, at 9:50am, Deon Brewis <[hidden email]> wrote:

> “From the OSX documentation:
>
> Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"),

Deon,

I’m not sure this is related, but have you seen

<http://www.sqlite.org/pragma.html#pragma_fullfsync>

<http://www.sqlite.org/pragma.html#pragma_checkpoint_fullfsync>

?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
In reply to this post by Simon Slavin-3
Hi Simon,

I'm using Couchbase Lite actually, not Couchbase server. Couchbase Lite 1.4
to be specific. Couchbase Lite 2.0 is still under development and it does
in fact use SQLite under the hood as does Couchbase Lite 1.4. They had
worked on using ForestDB, but I think they've abandoned that for their
Couchbase Lite 2.0 version.

I've tried calling pragma fullfsync=1 just after the database file was
opened, but I still had the same problem with corruption after a power
failure.

There's a big discussion I've had with the Couchbase Lite developers here
on their Github issues:

https://github.com/couchbase/couchbase-lite-ios/issues/1482

(newer posts are at the bottom)

On Tue, Apr 18, 2017 at 2:52 AM, Simon Slavin <[hidden email]> wrote:

>
> On 18 Apr 2017, at 6:01am, Brendan Duddridge <[hidden email]> wrote:
>
> > Perhaps the only think I can think of that I'm doing that may be
> unorthodox
> > is by storing the SQLite files inside a macOS package. But at the unix
> > layer that's really just a directory, so I don't know how that could
> cause
> > a problem. Unless macOS treats the package in a way differently than a
> > normal folder and is causing things to get moved around or written when a
> > power failure occurs.
>
> You should be safe storing a SQLite database inside a package.  As you
> write, a package is really just a folder with one extra bit set.
>
> Any demonstration code which uses the SQLite API directly we can attempt
> to debug.
>
> If the problem you need solving is actually with Couchbase, you can solve
> it by upgrading to any version from 2.0 onwards.  Since that no longer uses
> SQLite, SQLite database corruption can no longer be a problem.  Current
> versions of Couchbase no longer use SQLite for a persistence layer and
> Couchbase developers are unlikely to assist with any problem you find with
> such an old version.
>
> If your program demonstrating the problem has access to the SQLite
> database solely via libsqlcrypt.a, can you explain what you’re using it for
> ?  Is your database actually direct access to a Lotus Notes file ?
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
In reply to this post by Deon Brewis
Hi Deon,

That's interesting about the memory mapped IO. I just looked in the source
for Couchbase Lite and they do something like that on setup:

+ (void) firstTimeSetup {

    // Test the version of the actual SQLite implementation at runtime.
Necessary because

    // the app might be linked with a custom version of SQLite (like
SQLCipher) instead of the

    // system library, so the actual version/features may differ from what
was declared in

    // sqlite3.h at compile time.

    Log(@"Couchbase Lite using SQLite version %s (%s)",

        sqlite3_libversion(), sqlite3_sourceid());

#if 0

    for (int i=0; true; i++) {

        const char* opt = sqlite3_compileoption_get(i);

        if (!opt)

            break;

        Log(@"SQLite option '%s'", opt);

    }

#endif

    sSQLiteVersion = sqlite3_libversion_number();

    Assert(sSQLiteVersion >= 3007000,

           @"SQLite library is too old (%s); needs to be at least 3.7",
sqlite3_libversion());


    // Enable memory-mapped I/O if available

#ifndef SQLITE_CONFIG_MMAP_SIZE

#define SQLITE_CONFIG_MMAP_SIZE    22  /* sqlite3_int64, sqlite3_int64 */

#endif

    int err = sqlite3_config(SQLITE_CONFIG_MMAP_SIZE, (SInt64)
kSQLiteMMapSize, (SInt64)-1);

    if (err != SQLITE_OK)

        Log(@"FYI, couldn't enable SQLite mmap: error %d", err);


    sqlite3_config(SQLITE_CONFIG_LOG, errorLogCallback, NULL);

}

From your description, it sounds like that be causing this problem. Hmm...

Brendan


On Tue, Apr 18, 2017 at 2:50 AM, Deon Brewis <[hidden email]> wrote:

> Are you by change using memory mapped IO (MMAP_SIZE something other than
> 0)?
>
> This does not work on OSX. Not even remotely. I tracked an issue down in
> November 2015, and was able to trivially corrupt a database 100% of the
> time using the repro steps below. This happens long after our app gets shut
> down and SQLITE is flushed.  It got fixed when I set MMAP_SIZE to 0.
>
> Repro (100%):
>
> 1) Run our app
> 2) Shutdown our app
> 3) Wait for our app to cleanly shut down – nothing showing in Activity
> Monitor – and app.db-wal deleted from disk (i.e. SQLITE clean close)
> 4) Wait 2 minutes (so our app isn’t running in this 2 minute period at all)
>
> 5) Copy the our db file to a NAS
> 6) Hard reboot the machine (power cycle).
> 7) Copy the db file to the NAS again (no reopening the app, just copy the
> file back to the NAS after the reboot).
>
> Observe…
>
> The file from #5 still works fine.
> The file from #7 is corrupted.
>
> Note that the app or sqlite is nowhere involved in between #5 and #7
>
>
> I made this note in the bug when I fixed it - I believe it's related, but
> don't have the exact context:
> “From the OSX documentation:
>
> Note that while fsync() will flush all data from the host to the drive
> (i.e. the "permanent storage device"), the drive itself may not physically
> write the data to the platters for quite some time and it may be written in
> an out-of-order sequence. Specifically, if the drive loses power or the OS
> crashes, the application may find that only some or none of their data was
> written.  The disk drive may also re-order the data so that later writes
> may be present, while earlier writes are not.
>
> This is not a theoretical edge case.  This scenario is easily reproduced
> with real world workloads and drive power failures.”
>
> - Deon
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Brendan Duddridge
> Sent: Tuesday, April 18, 2017 1:36 AM
> To: SQLite mailing list <[hidden email]>
> Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS
>
> Hi Rowan,
>
> My apologies. My sample app does use NSDocument. But my production app
> doesn't. And they both exhibit the same problem.
>
> On Tue, Apr 18, 2017 at 12:29 AM, Rowan Worth <[hidden email]> wrote:
>
> > Hey Brendan,
> >
> > I'm no OSX expert, but from what I've read this afternoon about
> > NSDocument and friends combined with what I know about sqlite I have
> > to say you are completely mad to continue passing sqlite databases to
> > NSDocument,
> > *especially* as you don't define your own sub-class to do any of the
> > file management.
> >
> > Relying on some NSDocument sub-class builtin to the system and then
> > also opening the DB with CouchbaseLite may well violate section 2.2.1
> > of "how to corrupt an sqlite database" (multiple copies of sqlite
> > linked into the same application). Even if not, it sounds like
> > NSDocument has a tendency to copy files around for eg. auto-save
> purposes.
> >
> > I'd be very *very* surprised if there's an sqlite bug here. I'd take
> > the advice given to you on bountysource and watch your application's
> > file system operations to begin to understand what is going on behind
> > the scenes.
> >
> > -Rowan
> >
> >
> >
> > On 18 April 2017 at 13:01, Brendan Duddridge <[hidden email]> wrote:
> >
> > > Thanks Richard for your reply.
> > >
> > > Sorry about the COLLATE problem. That's a Couchbase Lite thing.
> > >
> > > I find it weird that a different WAL file is getting in there
> > > somehow
> > when
> > > a power failure occurs. I'm a bit stumped at how it can even write
> > > to the file system the moment the power shuts down so I don't even
> > > understand
> > how
> > > the database file can get corrupted.  Unless the corruption happens
> > > when the database file is opened up next and tries to use this
> > > improper WAL file.
> > >
> > > In my main project I'm using SQLCipher as my SQLite layer. But in
> > > the
> > demo
> > > project I posted, I'm just using the built-in macOS Sierra SQLite
> > library.
> > > In both cases though I can cause SQLite file corruption by cutting
> > > the power on my MacBook Pro.
> > >
> > > I'm going to do further investigations to see if there's any misuse
> > > of SQLite that I can find. Although the developers of Couchbase Lite
> > > are far smarter than I and I'm sure they're doing things by the book.
> > >
> > > Perhaps the only think I can think of that I'm doing that may be
> > unorthodox
> > > is by storing the SQLite files inside a macOS package. But at the
> > > unix layer that's really just a directory, so I don't know how that
> > > could
> > cause
> > > a problem. Unless macOS treats the package in a way differently than
> > > a normal folder and is causing things to get moved around or written
> > > when a power failure occurs.
> > >
> > > This problem has been plaguing me for quite a long time actually. I
> > > hope that I can find a solution somehow.
> > >
> > > Thanks,
> > >
> > > Brendan
> > >
> > > > I worked around the "COLLATE JSON" problem (by writing my own JSON
> > > > collation).  That allows me to analyze your database and WAL file.
> > > >
> > > > It appears that the WAL file is not the correct WAL file for that
> > > > database.  It is as if someone has taken an unrelated WAL file and
> > > > renamed it to have the same base name as your database.  Or the
> > > > other way around - someone has renamed your database to have the
> > > > same base name as the WAL file.
> > > >
> > > > Your demonstration application does not call SQLite directly.
> > > > Instead it appears to use two libraries that in turn call SQLite:
> > > > libsqlcrypt.a and the CouchbaseLite.framework.  You do not provide
> > > > sources to these other libraries, so I am unable to deduce what
> > > > they are doing.
> > > >
> > > > So, in the absence of further evidence, I am going to diagnose
> > > > this as a misuse of SQLite by one of the two libraries that you
> > > > are linking - probably a misuse in the form of trying to rename or
> > > > unlink or otherwise modify the database file using ordinary
> > > > operating system calls while a connection to the database is open.
> > > _______________________________________________
> > > sqlite-users mailing list
> > > [hidden email]
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3

On 18 Apr 2017, at 10:07am, Brendan Duddridge <[hidden email]> wrote:
>
>    // Enable memory-mapped I/O if available

Please try an alternative version of that code which definitely does not use memory mapping in any way.  You may be able to do it by changing that IFDEF sequence.  Or you should instead have SQLite execute this command

PRAGMA mmap_size=0;

before it opens any files.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Deon Brewis
In reply to this post by Simon Slavin-3
Yip. Tried that. At some point I had like 6 or 7 debug switches in the app to try all manner of FULLSYNC/WAL/SYNCHRONOUS combinations. At the end it was the MMAP_SIZE that did it.

It's not like it was subtle - it's a dead on repro. I was able to repro this by doing a power cycle 2 hours after shutting the app down. OSX didn't seem to have any interest in flushing mmap files until you soft reboot the machine.

The last time I tried this though was on Yosemite and Mavericks and whatever version of SQLITE was out at the time, so things may be different now. But it would be the first place I would look for corruption on OSX related to power cycling.

- Deon

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Tuesday, April 18, 2017 1:57 AM
To: SQLite mailing list <[hidden email]>
Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS


On 18 Apr 2017, at 9:50am, Deon Brewis <[hidden email]> wrote:

> “From the OSX documentation:
>
> Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"),

Deon,

I’m not sure this is related, but have you seen

<http://www.sqlite.org/pragma.html#pragma_fullfsync>

<http://www.sqlite.org/pragma.html#pragma_checkpoint_fullfsync>

?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

brendand
Hi Deon,

BINGO!!!!

I commented out this line of code in the Couchbase lite CBL_SQliteStorage.m
source file and no more corruption!!!!

//    int err = sqlite3_config(SQLITE_CONFIG_MMAP_SIZE,
(SInt64)kSQLiteMMapSize, (SInt64)-1);


It would be really awesome of this were added to the How to Corrupt an
SQLite Database web page. Although it would probably be prudent if someone
could verify these findings using my sample app and a patched version of
Couchbase Lite. I'll post the solution to the Couchbase Lite Github issue
that I started.

I used to be able to reproduce the corruption with a single power cut. But
now after commenting out that code, I'm unable to corrupt the database
after about 6 or 7 power cuts. So I think it's safe to say that this
problem is solved.

Big thanks to you Deon. You just saved my bacon! Thank you so much.


Brendan


On Tue, Apr 18, 2017 at 3:20 AM, Deon Brewis <[hidden email]> wrote:

> Yip. Tried that. At some point I had like 6 or 7 debug switches in the app
> to try all manner of FULLSYNC/WAL/SYNCHRONOUS combinations. At the end it
> was the MMAP_SIZE that did it.
>
> It's not like it was subtle - it's a dead on repro. I was able to repro
> this by doing a power cycle 2 hours after shutting the app down. OSX didn't
> seem to have any interest in flushing mmap files until you soft reboot the
> machine.
>
> The last time I tried this though was on Yosemite and Mavericks and
> whatever version of SQLITE was out at the time, so things may be different
> now. But it would be the first place I would look for corruption on OSX
> related to power cycling.
>
> - Deon
>
> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Simon Slavin
> Sent: Tuesday, April 18, 2017 1:57 AM
> To: SQLite mailing list <[hidden email]>
> Subject: Re: [sqlite] Database corruption, and PRAGMA fullfsync on macOS
>
>
> On 18 Apr 2017, at 9:50am, Deon Brewis <[hidden email]> wrote:
>
> > “From the OSX documentation:
> >
> > Note that while fsync() will flush all data from the host to the drive
> (i.e. the "permanent storage device"),
>
> Deon,
>
> I’m not sure this is related, but have you seen
>
> <http://www.sqlite.org/pragma.html#pragma_fullfsync>
>
> <http://www.sqlite.org/pragma.html#pragma_checkpoint_fullfsync>
>
> ?
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Richard Hipp-3
On 4/18/17, Brendan Duddridge <[hidden email]> wrote:

>
> I commented out this line of code in the Couchbase lite CBL_SQliteStorage.m
> source file and no more corruption!!!!
>
> //    int err = sqlite3_config(SQLITE_CONFIG_MMAP_SIZE,
> (SInt64)kSQLiteMMapSize, (SInt64)-1);
>
>
> It would be really awesome of this were added to the How to Corrupt an
> SQLite Database web page.

I was unaware that memory-mapped I/O was busted on MacOS.  I will
investigate further, but probably the solution will be that we will
completely disable memory-mapped I/O on MacOS, just as we have had to
do for OpenBSD.  Thanks for letting us know that this is a problem.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Simon Slavin-3

On 18 Apr 2017, at 11:52am, Richard Hipp <[hidden email]> wrote:

> I was unaware that memory-mapped I/O was busted on MacOS.  I will
> investigate further, but probably the solution will be that we will
> completely disable memory-mapped I/O on MacOS, just as we have had to
> do for OpenBSD.

I’m guessing that disabling memory-mapping will also prevent the SQLITE_IOERR_VNODE error that was mentioned here last week.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database corruption, and PRAGMA fullfsync on macOS

Jens Alfke-2
In reply to this post by Deon Brewis

> On Apr 18, 2017, at 1:50 AM, Deon Brewis <[hidden email]> wrote:
>
> Are you by change using memory mapped IO (MMAP_SIZE something other than 0)?
>
> This does not work on OSX. Not even remotely. I tracked an issue down in November 2015, and was able to trivially corrupt a database 100% of the time using the repro steps below. This happens long after our app gets shut down and SQLITE is flushed.  It got fixed when I set MMAP_SIZE to 0.

That is a hugely important piece of information, and while I thank you for letting me know, I’m shocked that this hasn’t been conveyed to the SQLite developers (or, apparently, to Apple.) Did you report it anywhere?

I’m the architect of Couchbase Lite and the lead developer for iOS and macOS. I enabled SQLite’s memory-mapped I/O at least two years ago. I never considered it could be problematic since (a) Brendan is IIRC the only iOS/Mac developer who’s reported database corruption, and (b) I assumed scenarios like this would be covered as part of SQLite testing, either by SQLite themselves or by Apple.

> “From the OSX documentation:
> Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence. Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written.  The disk drive may also re-order the data so that later writes may be present, while earlier writes are not.

This is a statement about hard disk controller firmware and is true for any OS. (I used to work for Apple, and corresponded with filesystem architect Dominic Giampaolo about this back in the day.) Some disk controllers don’t flush all cached data to the platters when told to flush, apparently to boost benchmark scores. Darwin has a FULLFSYNC command (an option to ioctl) that triggers an expensive full reset of the disk controller, which does produce a full flush.

The filesystem periodically performs FULLFSYNCs to ensure durability of changes to filesystem metadata, so that the filesystem itself can’t become corrupted by power loss. So I’m surprised that, in your test, cutting power after macroscopic amounts of time (2 minutes) have passed since closing the SQLite file still resulted in data being lost.

This doesn’t seem like it would be specific to memory-mapped I/O, though. Darwin has a universal buffer cache (like many other kernels) so memory-mapped writes and ordinary file writes are treated the same way by the filesystem cache.

Of course it’s possible there’s some kind of OS bug involved here; if so, it seems pretty serious. I’ll be glad to file a bug report with Apple if it turns out to be so.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12
Loading...