Re: FW: PRAGMA journal_size_limit prevents -wal growth or only size outside a transaction?

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Re: FW: PRAGMA journal_size_limit prevents -wal growth or only size outside a transaction?

Dan Kennedy-4
On 09/12/2017 09:24 AM, Joe Mistachkin wrote:

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]] On
> Behalf Of Howard Kapustein
> Sent: Monday, September 11, 2017 2:53 PM
> To: [hidden email]
> Subject: Re: [sqlite] PRAGMA journal_size_limit prevents -wal growth or only
> size outside a transaction?
> Looking through the source a bit I see sqlite3_busy_timeout is a no-fail
> operation (aside from misuse) so sqlite3_busy_timeout(0);
> SQLITE_CHECKPOINT_TRUNCATEsqlite3_busy_timeout(n); does what I need. Thanks
> all.
> - Howard
> P.S. sqlite3_close intentionally doesn't respect journal_size_limit. From
> sqlite3WalClose
>          }else if( pWal->mxWalSize>=0 ){
>            /* Try to truncate the WAL file to zero bytes if the checkpoint
>            ** completed and fsynced (rc==SQLITE_OK) and we are in persistent
>            ** WAL mode (bPersist) and if the PRAGMA journal_size_limit is a
>            ** non-negative value (pWal->mxWalSize>=0).  Note that we truncate
>            ** to zero bytes as truncating to the journal_size_limit might
>            ** leave a corrupt WAL file on disk. */
>            walLimitSize(pWal, 0);
>          }
> That last comment "Note that we truncate..." is interesting. When would this
> corrupt WAL rear its head?

Following recovery.

A wal file contains a series of transactions, where each transaction
consists of the modified versions of the database pages. Say a wal file
contains transactions 1, 2 and 3. Say you do a checkpoint, which copies
pages from all three transactions into the database file and then exit
leaving the wal file on disk. The next process that starts up reads the
entire wal file. If it then does a checkpoint, that's no problem - as
it's only overwriting the same database pages with the same data as the
first checkpointer did.

It would also be safe if our first checkpointer truncated the wal file
to zero bytes in size. The next client would start up and read no data
from the wal file, which is fine as all the data was already copied into
the db anyway.

But, instead of just exiting abruptly, say our first client truncated
the wal file so that it contains just transactions 1 and 2. Then another
client starts up, reads transactions 1 and 2 from the wal file and
eventually checkpoints them. In this case, if transactions 1 and 2
partially overlap (write some of the same pages) with transaction 3, the
db will become corrupted. Just as if only some of the pages in
transaction 3 had been written out in the first place.

This is why the "journal_file_size" limit is applied to the wal file by
the first writer, not the checkpointer. Only after the writer has
written a new wal file header can the wal file be safely truncated to
the journal_size_limit. Otherwise, an unlucky crash might leave us in
the situation described above, where a only a prefix of the wal file is
read following recovery.


sqlite-users mailing list
[hidden email]