Bug: Successfully committed transaction rolled back after power failure

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Bug: Successfully committed transaction rolled back after power failure

Meinlschmidt Stefan
Hi everybody!

TL;DR:

Shutting down power right after a successfully committed
transaction rolls back that transaction on next startup. This is a
problem in write-on-shutdown-then-power-off scenarios and violates my
expectation of SQLite's transactions being ACID. This can be fixed by
setting the dirSync-flag of the sqlite3OsDelete-call within
pager_end_transaction.

Full:

We are using a slightly patched SQLite 3.7.5 in an embedded system
running under QNX 6.5 as a data store that does the right thing even
when power fails, which has worked remarkably well up to now. After
migrating the database onto a QNX6 filesystem (which does not write
through synchronously) we almost reliably lose the last transaction on
power off, which happens ~1.5s after. We use journal mode, BTW, for its
more reliably limited file sizes.

While 3.7.5 is quite old, the same behaviour can easily be demonstrated
with a simple test program using sqlite-amalgamation-201601141433. To
try yourself, unpack sqlite3.{c,h} into the same directory as the
attached files and run make.

The test program opens a database in the current directory, creates a
table and adds a row to it, then asks you to power off. I did this on
our device, running the snoopy filesystem access logger alongside to see
what's happening:

> # ls -l
> total 16
> drwxr-xr-x  2 root      root           4096 Jan 01 00:02 .
> drwxrwxr-x  4 root      root           4096 Jan 01 00:02 ..
> # /fs/sda0/snoopy `pwd`
> # /fs/sda0/uncommit
> opening uncommit.sqlite...
> 4382854,1,,READLINK,/mnt/boardbook/test/uncommit.sqlite,No such file or directory
> 4382854,1,,OPEN,/mnt/boardbook/test/.,No error,
> 4382854,1,,CLOSE,/mnt/boardbook/test/.,No error
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No such file or directory,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,rwc
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 4382854,1,,DEVCTL,/mnt/boardbook/test/uncommit.sqlite,No error,40a8020d,FSYS_STATVFS
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,0
> 4382854,1,,READ,/mnt/boardbook/test/uncommit.sqlite,No error,0/100
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No such file or directory,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-wal,No such file or directory
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No such file or directory,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-wal,No such file or directory
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No error,rwc
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite-journal,No error,file
> 4382854,1,,CHOWN,/mnt/boardbook/test/uncommit.sqlite-journal,No error,0:0
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite-journal,No error,file
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite-journal,No error,0
> 4382854,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite-journal,No error,512/512
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,0
> 4382854,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite,No error,1024/1024
> 4382854,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,1024
> 4382854,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite,No error,1024/1024
> 4382854,1,,FSYNC,/mnt/boardbook/test/uncommit.sqlite,No error
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite-journal,No error
> 4382854,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-journal,No error
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 4382854,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 4382854,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> POWER OFF NOW!
Directly afterwards I yanked the cable. After reboot:

> # ls -l
> total 21
> drwxr-xr-x  2 root      root           4096 Jan 01 00:04 .
> drwxrwxr-x  4 root      root           4096 Jan 01 00:02 ..
> -rw-r--r--  1 root      root           2048 Jan 01 00:04 uncommit.sqlite
> -rw-r--r--  1 root      root            512 Jan 01 00:04 uncommit.sqlite-journal

The journal file is still there despite having been successfully
unlinked 9 lines before “POWER OFF NOW!” above. Which is not surprising,
given that the directory has not been synced in any way.

I did the same thing with a patched SQLite, changing one line in
pager_end_transaction() from

        rc = sqlite3OsDelete(pPager->pVfs, pPager->zJournal, 0);

to

        rc = sqlite3OsDelete(pPager->pVfs, pPager->zJournal, 1);

> # rm *
> # /fs/sda0/snoopy `pwd`
> # /fs/sda0/uncommit-patched
> opening uncommit.sqlite...
> 3252313,1,,READLINK,/mnt/boardbook/test/uncommit.sqlite,No such file or directory
> 3252313,1,,OPEN,/mnt/boardbook/test/.,No error,
> 3252313,1,,CLOSE,/mnt/boardbook/test/.,No error
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No such file or directory,
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,rwc
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 3252313,1,,DEVCTL,/mnt/boardbook/test/uncommit.sqlite,No error,40a8020d,FSYS_STATVFS
> 3252313,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,0
> 3252313,1,,READ,/mnt/boardbook/test/uncommit.sqlite,No error,0/100
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No such file or directory,
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-wal,No such file or directory
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No such file or directory,
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-wal,No such file or directory
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite-journal,No error,rwc
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite-journal,No error,file
> 3252313,1,,CHOWN,/mnt/boardbook/test/uncommit.sqlite-journal,No error,0:0
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite-journal,No error,file
> 3252313,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite-journal,No error,0
> 3252313,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite-journal,No error,512/512
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,0
> 3252313,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite,No error,1024/1024
> 3252313,1,,LSEEK,/mnt/boardbook/test/uncommit.sqlite,No error,1024
> 3252313,1,,WRITE,/mnt/boardbook/test/uncommit.sqlite,No error,1024/1024
> 3252313,1,,FSYNC,/mnt/boardbook/test/uncommit.sqlite,No error
> 3252313,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite-journal,No error
> 3252313,1,,UNLINK,/mnt/boardbook/test/uncommit.sqlite-journal,No error
> 3252313,1,,OPEN,/mnt/boardbook/test,No error,r
> 3252313,1,,FSYNC,/mnt/boardbook/test,No error
> 3252313,1,,CLOSE,/mnt/boardbook/test,No error
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,LOCK,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,OPEN,/mnt/boardbook/test/uncommit.sqlite,No error,
> 3252313,1,,STAT,/mnt/boardbook/test/uncommit.sqlite,No error,file
> 3252313,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> 3252313,1,,CLOSE,/mnt/boardbook/test/uncommit.sqlite,No error
> POWER OFF NOW!
Yank, reboot,

> # ls -l
> total 20
> drwxr-xr-x  2 root      root           4096 Jan 01 00:01 .
> drwxrwxr-x  4 root      root           4096 Jan 01 00:02 ..
> -rw-r--r--  1 root      root           2048 Jan 01 00:01 uncommit.sqlite

No zombie journal file this time, as expected, as unlinking the journal
is followed by fsyncing the directory. The same patch also fixes our
original problem.

I found some old email exchange about seemingly the same topic (see
http://sqlite.1065341.n5.nabble.com/Journal-deletion-no-directory-fsync-td37700.html),
but no conclusion. Therefore I'm not sure whether this is a genuine bug
or merely a tradeoff “it's more efficient that way and when power fails
a slight bit earlier the transaction is interrupted anyway”.

Please consider adding the fix to the code, at least as -D option.
SQLite journal mode doesn't guarantee durability otherwise.

S.M.
--
Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
Am Wolfsmantel 46, 91058 Tennenlohe, Germany
Tel: +49-8458-3332-531  [hidden email]
Fax: +49-8458-3332-20-531

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Makefile (136 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Rowan Worth
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup.

nitpick: This is sqlite behaving as advertised. See
https://www.sqlite.org/lockingv3.html section 5.0 step 6, and
https://www.sqlite.org/atomiccommit.html section 3.11 which explain that
the presence of a [well-formed] journal file is the mechanism via which
sqlite discriminates between a committed and in-progress transaction.

ie. according to sqlite a transaction is *not* successfully committed if
the journal file is still present, so its well within its rights to
rollback in this scenario.


That said, syncing the directory doesn't sound like a terrible idea. But
it's not clear to me that the cost of another sync every transaction is
worth a marginal reduction in the power-failure-leads-to-rollback window.
That's if it even reduces the window; it wouldn't surprise me to find that
the journal is removed from disk after the same delay both with and without
dirsync, the difference being that the dirsync prevents sqlite from
returning control to your code until its certain the commit has persisted.

There's certainly a surprising result here:

    if (sqlite3_exec("COMMIT") == SQLITE_OK) {
        /* post-commit logic */
    }

Logically, the code in the if block can reasonably assume that the
transaction will not rollback. But as you have discovered this is not
always true with JOURNAL_MODE=DELETE unless the dirsync is performed. To be
fair I don't think there are many power-failure scenarios where the
post-commit logic would have a chance to do anything significant, so the
incorrect assumption will usually be moot.

In your case it sounds like a controlled shutdown - is there a reason you
don't do a full disk sync before that?

-Rowan

On 19 January 2016 at 21:33, Meinlschmidt Stefan <
[hidden email]> wrote:

> Hi everybody!
>
> TL;DR:
>
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup. This is a
> problem in write-on-shutdown-then-power-off scenarios and violates my
> expectation of SQLite's transactions being ACID. This can be fixed by
> setting the dirSync-flag of the sqlite3OsDelete-call within
> pager_end_transaction.
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Meinlschmidt Stefan
Hi Rowan!

>> Shutting down power right after a successfully committed
>> transaction rolls back that transaction on next startup.
>
> nitpick: This is sqlite behaving as advertised. See
> https://www.sqlite.org/lockingv3.html section 5.0 step 6, and
> https://www.sqlite.org/atomiccommit.html section 3.11 which explain that
> the presence of a [well-formed] journal file is the mechanism via which
> sqlite discriminates between a committed and in-progress transaction.
>
> ie. according to sqlite a transaction is *not* successfully committed if
> the journal file is still present, so its well within its rights to
> rollback in this scenario.

Yes, a hot journal file means an incomplete transaction and should
absolutly roll back. What actually bugs me is that this happens after
COMMIT has returned without an error.

> That said, syncing the directory doesn't sound like a terrible idea. But
> it's not clear to me that the cost of another sync every transaction is
> worth a marginal reduction in the power-failure-leads-to-rollback window.

The actual costs are not clear to me either. I hope that someone on the
list with more experience in that field knows more.

> That's if it even reduces the window; it wouldn't surprise me to find that
> the journal is removed from disk after the same delay both with and without
> dirsync, the difference being that the dirsync prevents sqlite from
> returning control to your code until its certain the commit has persisted.

The filesystem used (QNX6) is advertised as performing a full sync on
every fsync and automatically some delay time after each write (10s),
and it looks like it is really doing so. If it wouldn't sync on fsync,
or if fsync would just wait until the automatic sync is through, I would
surely open a ticket with QNX and have some stern words with their resident.

That said I would expect the dirsync to shorten the window of
unnecessary rollback by an average of 5s in our case, which may or may
not be relevant to other users. On other filesystems with longer times
until it syncs anyway (I remember 30s from my first Linux box) the
reduction might be more substantial.

> There's certainly a surprising result here:
>
>     if (sqlite3_exec("COMMIT") == SQLITE_OK) {
>         /* post-commit logic */
>     }
>
> Logically, the code in the if block can reasonably assume that the
> transaction will not rollback.

And that's where the potential for a tradeoff lies. Should SQLite
guarantee that in post-commit the transaction is done, durably, even for
the cost of the additional? Or should it not guarantee durability an be
a bit (or much, I don't know) faster?

Personally I tend to assume a database is made for guaranteeing ACID
properties, except when the user explicitly decides otherwise. So in my
eyes an fsync plus possibly an #ifdef for explicitly disabling it would
be the cleanest thing. OTOH it might surprise users that are used to the
higher performance/lower load and don't care for durability, so an
#ifdef for explicitly enabling the fsync instead might still be a good
compromise.

> But as you have discovered this is not
> always true with JOURNAL_MODE=DELETE unless the dirsync is performed. To be
> fair I don't think there are many power-failure scenarios where the
> post-commit logic would have a chance to do anything significant, so the
> incorrect assumption will usually be moot.

Yes, in a normal power failure you “only” win more time, a few seconds
or more, depending on the filesystem's details.

BTW., as you have mentioned JOURNAL_MODE=DELETE–I have not examined what
happens in the other journal modes, i.e. I'm not sure whether
invalidating the journal by other means has the same lack of sync or
not. A proper patch might need to involve the other modes, too.

> In your case it sounds like a controlled shutdown - is there a reason you
> don't do a full disk sync before that?

Yes, it is a controlled shutdown, so in my case the /* post-commit logic
*/ basically pulls the plug.

Trouble is that I only control the database, not the shutdown procedure
(this is a commercial product with several hundred people working on
different aspects of the system). So while I can try to ask the shutdown
crew to sync like any real computer would do, I ultimately have no
saying in that but still need to ensure survival of data.

Without the patch (which I do have applied locally, of course), me and
everyone with a similar usecase get into trouble for relying on the
quite excellent reputation of SQLite. Actually I first fingerpointed to
the flash hardware or its driver, because “SQLite is well-tested and
doesn't have this kind of bugs” :-)

S.M.
--
Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
Am Wolfsmantel 46, 91058 Tennenlohe, Germany
Tel: +49-8458-3332-531  [hidden email]
Fax: +49-8458-3332-20-531
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Richard Hipp-3
In reply to this post by Meinlschmidt Stefan
On 1/19/16, Meinlschmidt Stefan <[hidden email]> wrote:
>
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup.

As you observe, this is a file-system dependent thing, and probably
only happens on the QNX filesystem.  I will see if we can add a
compile-time option to the next release that will allow you to force a
directory sync after the rollback journal commits.  But this won't be
on by default because it would do nothing but slow down commits on the
overwhelming majority of SQLite users.

Meanwhile, a good work-around for you might be to use either

     PRAGMA journal_mode=TRUNCATE;
     PRAGMA journal_mode=PERSIST;

Both of which sync the rollback-journal upon commit.  Or, use:

     PRAGMA journal_mode=WAL; PRAGMA synchronous=FULL;

which causes the write-ahead log to be synced following every commit.
Note that the default behavior for WAL mode is that the WAL is not
synced, and hence committed transactions might rollback following a
power loss.  This is the behavior most people desire (not that their
transactions roll back but rather they are willing to endure that in
exchange for fewer fsyncs.)  You must set "PRAGMA synchronous=FULL" to
cause the WAL to be synced after each transaction.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Richard Hipp-3
In reply to this post by Meinlschmidt Stefan
On 1/19/16, Meinlschmidt Stefan <[hidden email]> wrote:
>
> Shutting down power right after a successfully committed
> transaction rolls back that transaction on next startup.

Patches checked in:

    https://www.sqlite.org/src/info/30671345b1c1ee55
    https://www.sqlite.org/draft/compile.html#extra_durable

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Matthias-Christian Ott
On 25/01/16 14:14, Richard Hipp wrote:
> On 1/19/16, Meinlschmidt Stefan <[hidden email]> wrote:
>>
>> Shutting down power right after a successfully committed
>> transaction rolls back that transaction on next startup.
>
> Patches checked in:
>
>     https://www.sqlite.org/src/info/30671345b1c1ee55
>     https://www.sqlite.org/draft/compile.html#extra_durable

Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
transaction that it said to be committed depending on the VFS?

If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
be more important than performance, except when the constraints are such
that correctness has to be sacrificed for performance?

The trade-off that is described in the description of SQLite
SQLITE_EXTRA_DURABLE reads like an excerpt from the MySQL manual when
MyISAM was still widely used. Perhaps I'm also too irritated by
discussions with advocates of MySQL who would argue against the fact
that proper transactions were necessary because the DBMS would be faster
without them. That is not to say that the ACID properties and
transactions solve every concurrency or correctness problem but they
help significantly.

- Matthias-Christian

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Richard Hipp-3
On 1/25/16, Matthias-Christian Ott <[hidden email]> wrote:
>
> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
> transaction that it said to be committed depending on the VFS?

Sort of.  This appears to be true if you are running on QNX and you
lose power (or do a hard reset) shortly after the transaction commits.
It might be the case on other OSes/filesystems but it has never before
been observed.

This is not new behavior.  This is apparently what SQLite has been
doing for 15 years, across quadrillions of transactions on many
billions of devices, and it has never before caused any issues, until
just recently when Mr. Meinlschmidt upgraded to a newer version of
QNX.

>
> If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
> be more important than performance, except when the constraints are such
> that correctness has to be sacrificed for performance?
>
> The trade-off that is described in the description of SQLite
> SQLITE_EXTRA_DURABLE reads like an excerpt from the MySQL manual when
> MyISAM was still widely used. Perhaps I'm also too irritated by
> discussions with advocates of MySQL who would argue against the fact
> that proper transactions were necessary because the DBMS would be faster
> without them. That is not to say that the ACID properties and
> transactions solve every concurrency or correctness problem but they
> help significantly.
>

As you point out, it is an engineering tradeoff.

The feedback I receive is that most users of SQLite would much rather
avoid the extra directory syncs, even if it means having the last
transaction rollback following a power loss.  Most developers do not
care that much about durability, at least not enough to want to take
the performance hit of syncing the directory after every unlink.
Non-durable commits on power-loss have long been the default in WAL
mode (run-time fixable by setting PRAGMA synchronous=FULL) and nobody
has before ever complained.  Most people consider this a feature.  In
fact, if I recall correctly, we first made synchronous=NORMAL the
default in WAL mode by popular request.  WAL mode used to default to
power-loss durable but people requested the change for performance
reasons.

Note especially that this is about durability, not consistency.
SQLite guarantees consistency regardless.  People care about
consistency.  Durability, not so much.  I'm not a MySQL expert, but I
think the historical complaints about MyISAM had more to do with
consistency than with durability, did they not?

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Stephen Chrzanowski
In reply to this post by Matthias-Christian Ott
On Mon, Jan 25, 2016 at 10:08 AM, Matthias-Christian Ott <[hidden email]>
wrote:

>
> If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
> be more important than performance, except when the constraints are such
> that correctness has to be sacrificed for performance?
>

I wouldn't want that, as that would entirely depend on the application.
Maybe I'm an edge case, but I've written apps where the database is
'production', but not 'valuable' enough to have performance slapped.  The
app didn't care if the data was destroyed.  It'd go and recompile what it
needed from the external sources from which the now-destroyed database was
created on.  It basically was a cache type implementation, but changes to
the database then were pushed to whatever the external source was.

SQLite is also mostly looking at devices that are embeded, limited on
performance, and not a lot of wiggle room where performance would seriously
degrade the "worth-while-ness" of the device.  ... which is why I no longer
have a Rogers PVR as it'd take 5+ seconds to do anything...ahem... Anyways..

You also have to look at balance across many millions (or is it billions?)
of devices out there that use SQLite for their primary operations.
Slapping a serious performance decrease on devices where time and
performance is essential BY DEFAULT seems a little strict.  And all at the
cost of one transaction?  To some, maybe a worthwhile trade off, sure, but
I'm not sure I'm be overly outraged if my wifes music box (iPod) suddenly
had to recompile its music list.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Richard Hipp-3
On 1/25/16, Stephen Chrzanowski <[hidden email]> wrote:
>
> You also have to look at balance across many millions (or is it billions?)
> of devices out there that use SQLite for their primary operations.

Billions and billions.

> Slapping a serious performance decrease on devices where time and
> performance is essential BY DEFAULT seems a little strict.  And all at the
> cost of one transaction?  To some, maybe a worthwhile trade off, sure, but
> I'm not sure I'm be overly outraged if my wifes music box (iPod) suddenly
> had to recompile its music list.

I don't think it is even that serious.  This problem is that if you
(for example) set a new bookmark on your browser just as the cat is
tripping over the power cord, then after reboot the bookmark
disappears.  The bookmark database is still completely intact - it
just went backwards in time a little.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Stephan Beal-3
On Mon, Jan 25, 2016 at 5:08 PM, Richard Hipp <[hidden email]> wrote:

> On 1/25/16, Stephen Chrzanowski <[hidden email]> wrote:
> >
> > You also have to look at balance across many millions (or is it
> billions?)
> > of devices out there that use SQLite for their primary operations.
>
> Billions and billions.
>

https://en.wikipedia.org/wiki/Billions_and_Billions


> I don't think it is even that serious.  This problem is that if you
> (for example) set a new bookmark on your browser just as the cat is
> tripping over the power cord, then after reboot the bookmark
> disappears.  The bookmark database is still completely intact - it
> just went backwards in time a little.
>

Or, alternately, it _never went forward_ in time.

--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

R Smith
In reply to this post by Matthias-Christian Ott


On 2016/01/25 5:08 PM, Matthias-Christian Ott wrote:

> On 25/01/16 14:14, Richard Hipp wrote:
>> On 1/19/16, Meinlschmidt Stefan <[hidden email]> wrote:
>>> Shutting down power right after a successfully committed
>>> transaction rolls back that transaction on next startup.
>> Patches checked in:
>>
>>      https://www.sqlite.org/src/info/30671345b1c1ee55
>>      https://www.sqlite.org/draft/compile.html#extra_durable
> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
> transaction that it said to be committed depending on the VFS?
>
> If so, why isn't SQLITE_EXTRA_DURABLE=1 the default? Should correctness
> be more important than performance, except when the constraints are such
> that correctness has to be sacrificed for performance?

I understand your concern, but don't confuse the loss of a transaction
with any form of inconsistency or broken database integrity. It
essentially means that on some OSes (to knowledge, only QNX currently)
in the event of power loss or system breakdown right at the moment you
were editing and saving some value, you might suffer the inconvenience
of having to add that one song to your music list again, or have to add
the name of that one contact again. This is not really an
end-of-the-World-OMG-we-all-gonna-die! problem in general application
terms. It's quite acceptable to 99% of databases.

Sure, if your database is critical and needs the durability to be
absolute, you HAVE to go full synchronous regardless of OS, but that
must already be obvious at the very onset of your design and everything
you do or decide must keep this in mind. Such a developer would never
simply roll with the defaults of any DB system - I'd hope.

The very idea of defaults is to cater for the majority, not the rarest
of use-cases.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Matthias-Christian Ott
In reply to this post by Richard Hipp-3
On 2016-01-25 16:47, Richard Hipp wrote:

> On 1/25/16, Matthias-Christian Ott <[hidden email]> wrote:
>>
>> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
>> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
>> transaction that it said to be committed depending on the VFS?
>
> Sort of.  This appears to be true if you are running on QNX and you
> lose power (or do a hard reset) shortly after the transaction commits.
> It might be the case on other OSes/filesystems but it has never before
> been observed.
>
> This is not new behavior.  This is apparently what SQLite has been
> doing for 15 years, across quadrillions of transactions on many
> billions of devices, and it has never before caused any issues, until
> just recently when Mr. Meinlschmidt upgraded to a newer version of
> QNX.

So it would make sense to add a note that you should check whether your
target VFS and target operating environment needs an fsync after a
journal commit if you want to use this journal mode. Would it be
possible to make SQLITE_EXTRA_DURABLE a pragma? Some GNU/Linux
distributions package SQLite and therefore not every application can
compile SQLite with different options.

> Note especially that this is about durability, not consistency.
> SQLite guarantees consistency regardless.  People care about
> consistency.  Durability, not so much.  I'm not a MySQL expert, but I
> think the historical complaints about MyISAM had more to do with
> consistency than with durability, did they not?

That's also my understanding. It's unarguably an odd comparison but I
was referring to the ACID properties in general.

- Matthias-Christian
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Howard Chu
Matthias-Christian Ott wrote:

> On 2016-01-25 16:47, Richard Hipp wrote:
>> On 1/25/16, Matthias-Christian Ott <[hidden email]> wrote:
>>>
>>> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
>>> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
>>> transaction that it said to be committed depending on the VFS?
>>
>> Sort of.  This appears to be true if you are running on QNX and you
>> lose power (or do a hard reset) shortly after the transaction commits.
>> It might be the case on other OSes/filesystems but it has never before
>> been observed.
>>
>> This is not new behavior.  This is apparently what SQLite has been
>> doing for 15 years, across quadrillions of transactions on many
>> billions of devices, and it has never before caused any issues, until
>> just recently when Mr. Meinlschmidt upgraded to a newer version of
>> QNX.
>
> So it would make sense to add a note that you should check whether your
> target VFS and target operating environment needs an fsync after a
> journal commit if you want to use this journal mode. Would it be
> possible to make SQLITE_EXTRA_DURABLE a pragma? Some GNU/Linux
> distributions package SQLite and therefore not every application can
> compile SQLite with different options.

The question isn't just whether "an fsync is needed" with journal mode - the
question is *which* fsync is needed? The issue here is that file
creation/deletion/rename ops require an fsync *on the containing directory*.
This is actually quite an unusual requirement; on older Unix systems you
couldn't even *open* a directory, let alone obtain write access to it or fsync it.

--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Richard Hipp-3
On 1/25/16, Howard Chu <[hidden email]> wrote:
>
> This is actually quite an unusual requirement; on older Unix systems you
> couldn't even *open* a directory, let alone obtain write access to it or
> fsync it.

Yeah.  When the SQLITE_DISABLE_DIRSYNC compile-time option is present,
we disable the directory sync logic for this reason.  Some unixes
(HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work.  But Linux,
MacOS, and *BSD all work without it, so I thought I'd just not bring
that up...
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Howard Chu
Richard Hipp wrote:

> On 1/25/16, Howard Chu <[hidden email]> wrote:
>>
>> This is actually quite an unusual requirement; on older Unix systems you
>> couldn't even *open* a directory, let alone obtain write access to it or
>> fsync it.
>
> Yeah.  When the SQLITE_DISABLE_DIRSYNC compile-time option is present,
> we disable the directory sync logic for this reason.  Some unixes
> (HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work.  But Linux,
> MacOS, and *BSD all work without it, so I thought I'd just not bring
> that up...

I would have to say this is a BSD OS bug as it breaks the guarantees stated in
the manpages. I.e., rename() and unlink() syscalls are documented to be
atomic, and fsync() doesn't say anything about being needed to sync a directory.

http://www.unix.com/man-page/FreeBSD/2/fsync
http://www.unix.com/man-page/FreeBSD/2/unlink
http://www.unix.com/man-page/FreeBSD/2/rename/

I no longer have BSD source code on hand but I'd bet that when those manpages
were written, all directory modifications in the BSD ffs were always
synchronous. Linux obviously changed this but at least their fsync() manpage
documents the behavior.

--
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Warren Young-2
In reply to this post by Richard Hipp-3
On Jan 25, 2016, at 8:47 AM, Richard Hipp <[hidden email]> wrote:
>
> The feedback I receive is that most users of SQLite would much rather
> avoid the extra directory syncs, even if it means having the last
> transaction rollback following a power loss.

Why not do the directory fsync in sqlite3_close_v2()?

As I understand the original problem, this is happening in a system doing a controlled shutdown, not a crash or spontaneous reboot.

I’m close to this problem at the moment because we recently switched to using WAL mode by default, and I noticed that if there is a code path that causes sqlite3_close_v2() to be skipped, the WAL file remains behind, causing a SQLITE_NOTICE_RECOVER_WAL complaint when the app starts back up again.

If the last writer to a SQLite DB closes its connection down gracefully, there should be no uncertainty about whether all transactions have durably hit the disk.

I can live with such uncertainty if the last writer *doesn’t* gracefully close its connection.  That’s kind of concomitant with using an in-process DBMS.

(Contrasting with a client-server DBMS, where durability is not compromised if a remote client disappears after COMMIT without gracefully closing its TCP connection afterward.)
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Rowan Worth
In reply to this post by Meinlschmidt Stefan
On 25 January 2016 at 18:26, Meinlschmidt Stefan <
[hidden email]> wrote:

> > In your case it sounds like a controlled shutdown - is there a reason you
> > don't do a full disk sync before that?
>
> Yes, it is a controlled shutdown, so in my case the /* post-commit logic
> */ basically pulls the plug.
>
> Trouble is that I only control the database, not the shutdown procedure
> (this is a commercial product with several hundred people working on
> different aspects of the system). So while I can try to ask the shutdown
> crew to sync like any real computer would do, I ultimately have no
> saying in that but still need to ensure survival of data.
>
> Without the patch (which I do have applied locally, of course), me and
> everyone with a similar usecase get into trouble for relying on the
> quite excellent reputation of SQLite. Actually I first fingerpointed to
> the flash hardware or its driver, because “SQLite is well-tested and
> doesn't have this kind of bugs” :-)
>

But if the shutdown procedure doesn't actually sync the disk, surely its
not just sqlite users that will suffer? _Anything_ which has done a write()
in the last ~10 seconds without sync()ing is going to lose data, and unlike
sqlite the vast majority of standard tools _never_ call sync().

Obviously I don't know much about your project/environment, so perhaps this
is what you want. It just strikes me as a problem easier solved at the
system level is all :)

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Bernard McNeill
In reply to this post by Richard Hipp-3
Just to be clear

Situation: Under Linux/Debian, Sqlite opens an entirely valid DB, and runs
an entirely valid SQL transaction against that database.
Following a Commit, the application gets back a 'Commit Successful' code.
(Ignore any issues of disks returning hardware 'write done' flags
prematurely).
There is then a power/OS failure.

Using the standard defaults (which avoid WAL), is there any possibility
whatsoever of that last SQL transaction being lost?


Best regards

On Mon, Jan 25, 2016 at 8:39 PM, Richard Hipp <[hidden email]> wrote:

> On 1/25/16, Howard Chu <[hidden email]> wrote:
> >
> > This is actually quite an unusual requirement; on older Unix systems you
> > couldn't even *open* a directory, let alone obtain write access to it or
> > fsync it.
>
> Yeah.  When the SQLITE_DISABLE_DIRSYNC compile-time option is present,
> we disable the directory sync logic for this reason.  Some unixes
> (HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work.  But Linux,
> MacOS, and *BSD all work without it, so I thought I'd just not bring
> that up...
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

Simon Slavin-3

On 27 Jan 2016, at 8:51am, Bernard McNeill <[hidden email]> wrote:

> Situation: Under Linux/Debian, Sqlite opens an entirely valid DB, and runs
> an entirely valid SQL transaction against that database.
> Following a Commit, the application gets back a 'Commit Successful' code.
> (Ignore any issues of disks returning hardware 'write done' flags
> prematurely).

We can ignore this, but almost all systems are set up that way.  So we shouldn't.

> There is then a power/OS failure.
>
> Using the standard defaults (which avoid WAL), is there any possibility
> whatsoever of that last SQL transaction being lost?

If all of these ...

1) there was a gap of a second or so for the writing commands to complete
2) the power failure doesn't cause spurious data to be written to disk
3) the power failure doesn't cause low-level disk format problems
4) the power failure doesn't physically damage the drive
5) some other part of the OS doesn't get confused and write over the database

then you should have an uncorrupted database with the last transaction correctly written.

Under these circumstances the database is held not just in the database file but partly in the journal file.  If you analyse just the database file you may conclude that you have a corrupted database or that the most recent transaction has been lost.  One of the jobs of the SQLite _open() commands is to notice that the database file and/or the journal file indicate that the database wasn't closed cleanly, and to rescue as recent as possible uncorrupted database.  It will do this without informing the calling program.  The calling program will just thing the database was fine to begin with.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Bug: Successfully committed transaction rolled back after power failure

James K. Lowden
In reply to this post by Bernard McNeill
On Wed, 27 Jan 2016 08:51:16 +0000
Bernard McNeill <[hidden email]> wrote:

> Using the standard defaults (which avoid WAL), is there any
> possibility whatsoever of that last SQL transaction being lost?

I have an unusual answer: Yes, and it doesn't matter.  

Let's suppose, as you did, that the application got back "Commit
Successful" followed quickly by a power failure.  You want to know,
could the transaction be lost anyway?  But I ask you, what action could
the application possibly take, in that subsecond interval, that it
matters?  The failure could have happened an instant earlier, just
before "Commit Successful", and it would have to recover from that.  I
can think of no scenario in which the committed-but-lost transaction is
very different from the uncommitted-and-lost transaction.  

There is no God's-eye view of application state.  The important service
provided by the DBMS is not "what's committed is definitely saved", but
rather that "what's committed is definitely *consistent*".  If on
recovery the DBMS finds that,  contrary to its prior report, the state
of the database does not warrant including the final transaction, it
rolls it back, and presents the data in an internally consistent
state.  The application doesn't have to cope with the ledger being out
of balance and suchlike.  

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
123