advice about schema versioning and WAL

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

advice about schema versioning and WAL

Gwendal Roué-2
Hello,

Season's greetings to all SQLite fellows!


I'm developping a library that would like to keep a "cache" of some information about the database schema. Such information are the columns of a table, its primary key, or its indexes. The purpose of this cache is not really speed, even if it may help, but mainly to avoid cluttering the sqlite3_trace/sqlite3_trace_v2 hooks with noisy pragmas whenever the library needs to infer some implicit information from the actual database schema.

This cache has to be invalidated whenever the schema changes. The Compile-Time Authorization Callback [1] is the perfect tool for the job, since it allows to identify statements that create, drop, alter tables and indexes.

Everything is fine and easy when a single connection is used in a single-threaded way: statements are executed one after the other, and the management of the schema cache is trivial.


It's much less trivial with the WAL mode. I focus on a setup which uses a unique writer connection, and several reader connections. All connections are used sequentially in their own thread, but readers and writer can run concurrently in order to take advantage from the WAL mode.

When a read-only connection uses a deferred transaction to enter snapshot isolation, it doesn't see the changes performed by other transactions. For example, if a reader acquires snapshot isolation before a table is altered by the writer, it won't see the alteration until it commits its deferred transaction. I wish my schema cache would behave the same.

To be precise, I only have two important needs:

1. A connection's schema cache is correct, which means that it never contains information that does not match SQLite's genuine view of the database schema. Being invalidated/empty is correct, if not efficient (the missing information is then loaded from SQLite).
2. Synchronization points between readers and writers are avoided (non-blocking access is the whole point of WAL, and I want to avoid locks as much as possible)

I was hoping that a connection would have a "schema version": an automatically incremented value that SQLite bumps whenever the schema is changed. That would have been enough for my use case. Unfortunately, PRAGMA schema_version reads the database header, and I thus guess that it does not play well with WAL (I'm not sure). Furthermore, PRAGMA schema_version clutters the tracing hook.

The most simple solution I have is to invalidate a reader's schema cache each time it is used. This would unfortunately invalidate the readers' caches too often, since most real-life uses only alter the schema at application start-up, which means that the schema is, practically speaking, stable after this initialisation phase.

Do any of you have any better idea?

Thanks in advance, regards,
Gwendal Roué

[1] https://sqlite.org/c3ref/set_authorizer.html
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: advice about schema versioning and WAL

Dan Kennedy-4
On 12/29/2017 01:28 AM, Gwendal Roué wrote:

> Hello,
>
> Season's greetings to all SQLite fellows!
>
>
> I'm developping a library that would like to keep a "cache" of some information about the database schema. Such information are the columns of a table, its primary key, or its indexes. The purpose of this cache is not really speed, even if it may help, but mainly to avoid cluttering the sqlite3_trace/sqlite3_trace_v2 hooks with noisy pragmas whenever the library needs to infer some implicit information from the actual database schema.
>
> This cache has to be invalidated whenever the schema changes. The Compile-Time Authorization Callback [1] is the perfect tool for the job, since it allows to identify statements that create, drop, alter tables and indexes.
>
> Everything is fine and easy when a single connection is used in a single-threaded way: statements are executed one after the other, and the management of the schema cache is trivial.
>
>
> It's much less trivial with the WAL mode. I focus on a setup which uses a unique writer connection, and several reader connections. All connections are used sequentially in their own thread, but readers and writer can run concurrently in order to take advantage from the WAL mode.
>
> When a read-only connection uses a deferred transaction to enter snapshot isolation, it doesn't see the changes performed by other transactions. For example, if a reader acquires snapshot isolation before a table is altered by the writer, it won't see the alteration until it commits its deferred transaction. I wish my schema cache would behave the same.
>
> To be precise, I only have two important needs:
>
> 1. A connection's schema cache is correct, which means that it never contains information that does not match SQLite's genuine view of the database schema. Being invalidated/empty is correct, if not efficient (the missing information is then loaded from SQLite).
> 2. Synchronization points between readers and writers are avoided (non-blocking access is the whole point of WAL, and I want to avoid locks as much as possible)
>
> I was hoping that a connection would have a "schema version": an automatically incremented value that SQLite bumps whenever the schema is changed. That would have been enough for my use case. Unfortunately, PRAGMA schema_version reads the database header, and I thus guess that it does not play well with WAL (I'm not sure).


"PRAGMA schema_version" should work the same way in WAL mode. The pragma
will read the "database header" from the newest version of page 1 in the
wal file if required.

Dan.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: advice about schema versioning and WAL

Gwendal Roué-2
In reply to this post by Gwendal Roué-2
Thanks Dan for this information!

(For some reason, your reply appears on the mailing list archive, but never found the way of my inbox. I'm replying to my own message)

Gwendal

> Dan Kennedy wrote:
>
> "PRAGMA schema_version" should work the same way in WAL mode. The pragma
> will read the "database header" from the newest version of page 1 in the
> wal file if required.
>
> Dan.
>
>> Le 28 déc. 2017 à 19:28, Gwendal Roué <[hidden email]> a écrit :
>>
>> Hello,
>>
>> Season's greetings to all SQLite fellows!
>>
>>
>> I'm developping a library that would like to keep a "cache" of some information about the database schema. Such information are the columns of a table, its primary key, or its indexes. The purpose of this cache is not really speed, even if it may help, but mainly to avoid cluttering the sqlite3_trace/sqlite3_trace_v2 hooks with noisy pragmas whenever the library needs to infer some implicit information from the actual database schema.
>>
>> This cache has to be invalidated whenever the schema changes. The Compile-Time Authorization Callback [1] is the perfect tool for the job, since it allows to identify statements that create, drop, alter tables and indexes.
>>
>> Everything is fine and easy when a single connection is used in a single-threaded way: statements are executed one after the other, and the management of the schema cache is trivial.
>>
>>
>> It's much less trivial with the WAL mode. I focus on a setup which uses a unique writer connection, and several reader connections. All connections are used sequentially in their own thread, but readers and writer can run concurrently in order to take advantage from the WAL mode.
>>
>> When a read-only connection uses a deferred transaction to enter snapshot isolation, it doesn't see the changes performed by other transactions. For example, if a reader acquires snapshot isolation before a table is altered by the writer, it won't see the alteration until it commits its deferred transaction. I wish my schema cache would behave the same.
>>
>> To be precise, I only have two important needs:
>>
>> 1. A connection's schema cache is correct, which means that it never contains information that does not match SQLite's genuine view of the database schema. Being invalidated/empty is correct, if not efficient (the missing information is then loaded from SQLite).
>> 2. Synchronization points between readers and writers are avoided (non-blocking access is the whole point of WAL, and I want to avoid locks as much as possible)
>>
>> I was hoping that a connection would have a "schema version": an automatically incremented value that SQLite bumps whenever the schema is changed. That would have been enough for my use case. Unfortunately, PRAGMA schema_version reads the database header, and I thus guess that it does not play well with WAL (I'm not sure). Furthermore, PRAGMA schema_version clutters the tracing hook.
>>
>> The most simple solution I have is to invalidate a reader's schema cache each time it is used. This would unfortunately invalidate the readers' caches too often, since most real-life uses only alter the schema at application start-up, which means that the schema is, practically speaking, stable after this initialisation phase.
>>
>> Do any of you have any better idea?
>>
>> Thanks in advance, regards,
>> Gwendal Roué
>>
>> [1] https://sqlite.org/c3ref/set_authorizer.html
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users