SQLITE vs. OSX mmap == inevitable catalog corruption?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

SQLITE vs. OSX mmap == inevitable catalog corruption?

deon
We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't usable in any way shape or form. It will inevitably lead to catalog corruption if you hard-reboot OSX, even without the database or application open.

We've tried FULLSYNC and F_FULLFSYNC, but it makes no difference.

Repro steps:
a) Run our application and write stuff to the database
b) Close our app & wait for the .wal file to disappear AND the app to disappear from activity monitor
c) Wait another 2 minutes
d) Copy your SQLITE db file to a network share
e) Hard reboot OSX
f) After the reboot, copy the same SQLITE db to a second network share
g) File compare (d) vs. (f)

Observe: The files in (c) and (e) are virtually never identical. NOTE: There is no application or SQLITE is involved here. The app is closed. It (should have) checkpointed and flushed properly (FULLSYNC + FULLFSYNC). However, the MAC doesn't seem to write the database to disk. So once you hard-reboot you don't have the same file anymore. SQLITE detects this as catalog corruption around 25% of the time, but a file compare shows differences pretty much 100% of the time. Not just benign differences in unused pages - the header is more often than not different as well.

Without the Hard reboot our database always survives a close. You can gracefully shutdown the application, pkill it, force terminate, crash it, soft reboot - it all survives. However, once you hard reboot OSX - even AFTER the app is closed - it has a very high probability of corrupting our database.  I've seen a worse case scenario where the hard reboot followed an app graceful shutdown by 12 hours, and it still corrupted the database.

This is so easy to reproduce I'm not sure why this isn't reported as a large-scale problem? It also only reproduces on OSX (both El Capitan and Mavericks) - Android, iOS, PC all work fine.


Anyway, it's not a big deal for us to set mmap_size to 0 to work around this.

The big problem I have however is the .wal.  We use SQLITE from multiple threads, and as such it's using shared memory to read/write the .wal. However, if persisting memory mapped files on OSX is so unreliable, then how can the .wal be expected to survive a hard reboot... So I'd also like to have a way to not use memory mapped I/O for .wal files, but I don't think there is a way unless I change the architecture of my app to have single-threaded access to SQLITE? Or is there another way?

- Deon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLITE vs. OSX mmap == inevitable catalog corruption?

Richard Hipp-3
On 11/16/15, Deon Brewis <[hidden email]> wrote:

> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't
> usable in any way shape or form. It will inevitably lead to catalog
> corruption if you hard-reboot OSX, even without the database or application
> open.
>
> We've tried FULLSYNC and F_FULLFSYNC, but it makes no difference.
>
> Repro steps:
> a) Run our application and write stuff to the database
> b) Close our app & wait for the .wal file to disappear AND the app to
> disappear from activity monitor
> c) Wait another 2 minutes
> d) Copy your SQLITE db file to a network share
> e) Hard reboot OSX
> f) After the reboot, copy the same SQLITE db to a second network share
> g) File compare (d) vs. (f)
>
> Observe: The files in (c) and (e) are virtually never identical. NOTE: There
> is no application or SQLITE is involved here. The app is closed. It (should
> have) checkpointed and flushed properly (FULLSYNC + FULLFSYNC). However, the
> MAC doesn't seem to write the database to disk. So once you hard-reboot you
> don't have the same file anymore. SQLITE detects this as catalog corruption
> around 25% of the time, but a file compare shows differences pretty much
> 100% of the time. Not just benign differences in unused pages - the header
> is more often than not different as well.
>
> Without the Hard reboot our database always survives a close. You can
> gracefully shutdown the application, pkill it, force terminate, crash it,
> soft reboot - it all survives. However, once you hard reboot OSX - even
> AFTER the app is closed - it has a very high probability of corrupting our
> database.  I've seen a worse case scenario where the hard reboot followed an
> app graceful shutdown by 12 hours, and it still corrupted the database.
>
> This is so easy to reproduce I'm not sure why this isn't reported as a
> large-scale problem? It also only reproduces on OSX (both El Capitan and
> Mavericks) - Android, iOS, PC all work fine.
>
>
> Anyway, it's not a big deal for us to set mmap_size to 0 to work around
> this.
>
> The big problem I have however is the .wal.  We use SQLITE from multiple
> threads, and as such it's using shared memory to read/write the .wal.

No.  It uses shared memory for the ".shm" file, which is only a
performance optimization and is not used for recovery.  The ".wal"
file is written using write() or pwrite().


> However, if persisting memory mapped files on OSX is so unreliable, then how
> can the .wal be expected to survive a hard reboot... So I'd also like to
> have a way to not use memory mapped I/O for .wal files, but I don't think
> there is a way unless I change the architecture of my app to have
> single-threaded access to SQLITE? Or is there another way?

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLITE vs. OSX mmap == inevitable catalog corruption?

Török Edwin
In reply to this post by deon
On 11/17/2015 12:11 AM, Deon Brewis wrote:
> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't usable in any way shape or form. It will inevitably lead to catalog corruption if you hard-reboot OSX, even without the database or application open.
>

I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7 and Debian Jessie, but after 24-48h of continous writes to the DB I always get a corruption:
SQLite result 0xb: database corruption at line 76915 of [767c1727fe]
SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks WHERE revision_id=:revision_id]
Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed: (code 0xb: database disk image is malformed) database disk image is malformed

After this 'pragma integrity check' says:
row 90814 missing from index sqlite_autoindex_revision_ops_1
wrong # of entries in index sqlite_autoindex_revision_ops_1
row 1046646 missing from index idx_revmap
row 1046646 missing from index sqlite_autoindex_revision_blocks_1
wrong # of entries in index idx_revmap
wrong # of entries in index sqlite_autoindex_revision_blocks_1

There are not reboots involved, just multiple processes accessing a WAL DB. Without mmap I/O I've never seen corrupted DBs in our application.

The corruption doesn't lead to any data loss, apparently only the indexes were corrupted beacuse I was able to recover the DBs with .clone+setting journal mode again.

However I haven't had the time yet to write a standalone testcase and attempt to reproduce this with SQLite alone, so I can't rule out that this isn't a memory corruption bug in our application, or one of the other libraries that we link with.

I notice that SQLite doesn't use msync, and msync(2) says 'Without use of this call there is no guarantee that changes are written back before munmap(2) is called'.
Is f(data)sync enough to ensure changed mmap pages are written to the disk?

--
Edwin Török | Co-founder and Lead Developer

Skylable open-source object storage: reliable, fast, secure
http://www.skylable.com
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLITE vs. OSX mmap == inevitable catalog corruption?

Dan Kennedy-4
On 11/18/2015 03:37 PM, Török Edwin wrote:

> On 11/17/2015 12:11 AM, Deon Brewis wrote:
>> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't usable in any way shape or form. It will inevitably lead to catalog corruption if you hard-reboot OSX, even without the database or application open.
>>
> I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7 and Debian Jessie, but after 24-48h of continous writes to the DB I always get a corruption:
> SQLite result 0xb: database corruption at line 76915 of [767c1727fe]
> SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks WHERE revision_id=:revision_id]
> Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed: (code 0xb: database disk image is malformed) database disk image is malformed
>
> After this 'pragma integrity check' says:
> row 90814 missing from index sqlite_autoindex_revision_ops_1
> wrong # of entries in index sqlite_autoindex_revision_ops_1
> row 1046646 missing from index idx_revmap
> row 1046646 missing from index sqlite_autoindex_revision_blocks_1
> wrong # of entries in index idx_revmap
> wrong # of entries in index sqlite_autoindex_revision_blocks_1
>
> There are not reboots involved, just multiple processes accessing a WAL DB. Without mmap I/O I've never seen corrupted DBs in our application.

As of yesterday, SQLite uses a read-only mapping in mmap mode. The db
file is written using plain old write(), just as in non-mmap mode:

   http://sqlite.org/src/info/67c5d3c646c8198c

It would be interesting to know if this clears the problem in your
environment.

Dan.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLITE vs. OSX mmap == inevitable catalog corruption?

Howard Chu
Dan Kennedy wrote:

> On 11/18/2015 03:37 PM, Török Edwin wrote:
>> On 11/17/2015 12:11 AM, Deon Brewis wrote:
>>> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't
>>> usable in any way shape or form. It will inevitably lead to catalog
>>> corruption if you hard-reboot OSX, even without the database or application
>>> open.
>>>
>> I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7
>> and Debian Jessie, but after 24-48h of continous writes to the DB I always
>> get a corruption:
>> SQLite result 0xb: database corruption at line 76915 of [767c1727fe]
>> SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks
>> WHERE revision_id=:revision_id]
>> Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed:
>> (code 0xb: database disk image is malformed) database disk image is malformed
>>
>> After this 'pragma integrity check' says:
>> row 90814 missing from index sqlite_autoindex_revision_ops_1
>> wrong # of entries in index sqlite_autoindex_revision_ops_1
>> row 1046646 missing from index idx_revmap
>> row 1046646 missing from index sqlite_autoindex_revision_blocks_1
>> wrong # of entries in index idx_revmap
>> wrong # of entries in index sqlite_autoindex_revision_blocks_1
>>
>> There are not reboots involved, just multiple processes accessing a WAL DB.
>> Without mmap I/O I've never seen corrupted DBs in our application.
>
> As of yesterday, SQLite uses a read-only mapping in mmap mode. The db file is
> written using plain old write(), just as in non-mmap mode:

That's the safest way to use mmap, but keep in mind that this requires a
unified buffer cache and systems like OpenBSD still don't have that, so this
approach will cause corruptions on systems like that.
>
>    http://sqlite.org/src/info/67c5d3c646c8198c
>
> It would be interesting to know if this clears the problem in your environment.


--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLITE vs. OSX mmap == inevitable catalog corruption?

Richard Hipp-3
On 11/18/15, Howard Chu <[hidden email]> wrote:
> Dan Kennedy wrote:
>>
>> As of yesterday, SQLite uses a read-only mapping in mmap mode. The db file
>> is
>> written using plain old write(), just as in non-mmap mode:
>
> That's the safest way to use mmap, but keep in mind that this requires a
> unified buffer cache and systems like OpenBSD still don't have that, so this
> approach will cause corruptions on systems like that.

Thanks for the warning, Howard.

We independently discovered this back a few years ago and thus disable the mmap
capability for OpenBSD and QNX.  See
https://www.sqlite.org/src/artifact/1b8c1b37f0?ln=742-748 for the
code.  Those are the only modern platforms that we know of that give
problems.  If you are aware of others, please let us know.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SQLITE vs. OSX mmap == inevitable catalog corruption?

testn
This post has NOT been accepted by the mailing list yet.