Most efficient way to detect on-disk change

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Most efficient way to detect on-disk change

wmertens
I'm working with a db that's only written to in transations, and each
transaction increases a db-global version counter.

This means that I can cache all reads, unless the version changed.

What would be the most efficient way to make sure I *never* serve stale
data?

Right now everything's a single process, so it's really easy, just clear
the cache on every write. However, I want to be prepared for the near
future where I will have multiple processes using this db file.

I'm thinking that to detect writes, this might be a safe approach:

Before serving any cached read, check the timestamp on the wal file. If it
changed, read the global version. If it changed, clear the cache.
Otherwise, serve the cached read.

Is it safe to assume that all writes would mean change of the wal file
timestamp?
More importantly, is it faster to check the timestamp or would a prepared
query for the version actually be faster (and safer)?

Also, I'm using WAL right now, but I wonder if that's really useful given
the single-writer-at-a-time?

Thank you for your insights,

Wout.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

David Raymond
I think pragma data_version is what you're looking for.
http://www.sqlite.org/pragma.html#pragma_data_version


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Wout Mertens
Sent: Tuesday, November 07, 2017 1:08 PM
To: SQLite mailing list
Subject: [sqlite] Most efficient way to detect on-disk change

I'm working with a db that's only written to in transations, and each
transaction increases a db-global version counter.

This means that I can cache all reads, unless the version changed.

What would be the most efficient way to make sure I *never* serve stale
data?

Right now everything's a single process, so it's really easy, just clear
the cache on every write. However, I want to be prepared for the near
future where I will have multiple processes using this db file.

I'm thinking that to detect writes, this might be a safe approach:

Before serving any cached read, check the timestamp on the wal file. If it
changed, read the global version. If it changed, clear the cache.
Otherwise, serve the cached read.

Is it safe to assume that all writes would mean change of the wal file
timestamp?
More importantly, is it faster to check the timestamp or would a prepared
query for the version actually be faster (and safer)?

Also, I'm using WAL right now, but I wonder if that's really useful given
the single-writer-at-a-time?

Thank you for your insights,

Wout.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Simon Slavin-3


On 7 Nov 2017, at 6:53pm, David Raymond <[hidden email]> wrote:

> I think pragma data_version is what you're looking for.
> http://www.sqlite.org/pragma.html#pragma_data_version

An excellent solution, though there’s a /caveat/.  From the original post:

> Right now everything's a single process, so it's really easy, just clear
> the cache on every write. However, I want to be prepared for the near
> future where I will have multiple processes using this db file.

Documentation for the PRAGMA says

" The "PRAGMA data_version" value is a local property of each database connection and so values returned by two concurrent invocations of "PRAGMA data_version" on separate database connections are often different even though the underlying database is identical. "

So when you convert your code to use multiple processes, they must all use the same connection for this to work properly.  That means you will have to invent a method to ensure that only one of them makes changes at a time.  That might not be what you wanted to do.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Jens Alfke-2


> On Nov 7, 2017, at 11:41 AM, Simon Slavin <[hidden email]> wrote:
>
> So when you convert your code to use multiple processes, they must all use the same connection for this to work properly.

No; it just means each process will track its own data-version value based on its own connection. It doesn’t matter if those values are different in different processes.

(In any case, it’s impossible for multiple processes to [directly] use the same SQLite connection, since a connection is an in-memory object.)

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Keith Medcalf
In reply to this post by wmertens

So you are caching data at the application level that is cached at the database page cache level which is cached in the Operating System file cache that lives in a file residing on disk -- effectively storing three copies of the data in memory.

What advantage does your third-level (application) cache provide that is not provided by the two lower level caches?  In other words if the data is already stored in the page cache and (that which isn't) is already in the OS file cache (which it must be since you read the data once already), what benefit does the third-level cache provide other than add the overhead and complication of its management?

Likely the most "efficient" way to ensure you do not serve stale data is to get rid of the application level caching and simply re-retrieve the data when you need it.  Unless of course you are seriously memory constrained such at running this on a model 5150.  Or your query's are very long and complicated.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Wout Mertens
>Sent: Tuesday, 7 November, 2017 11:08
>To: SQLite mailing list
>Subject: [sqlite] Most efficient way to detect on-disk change
>
>I'm working with a db that's only written to in transations, and each
>transaction increases a db-global version counter.
>
>This means that I can cache all reads, unless the version changed.
>
>What would be the most efficient way to make sure I *never* serve
>stale
>data?
>
>Right now everything's a single process, so it's really easy, just
>clear
>the cache on every write. However, I want to be prepared for the near
>future where I will have multiple processes using this db file.
>
>I'm thinking that to detect writes, this might be a safe approach:
>
>Before serving any cached read, check the timestamp on the wal file.
>If it
>changed, read the global version. If it changed, clear the cache.
>Otherwise, serve the cached read.
>
>Is it safe to assume that all writes would mean change of the wal
>file
>timestamp?
>More importantly, is it faster to check the timestamp or would a
>prepared
>query for the version actually be faster (and safer)?
>
>Also, I'm using WAL right now, but I wonder if that's really useful
>given
>the single-writer-at-a-time?
>
>Thank you for your insights,
>
>Wout.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

J Decker
In reply to this post by wmertens
in linux inotify - http://man7.org/linux/man-pages/man7/inotify.7.html
in windows  FindFirstChangeNotification/FindNextChangeNotification
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365261(v=vs.85).aspx

if you wait for an actual change before checking to see if there really was
a change....
and can prevent arbitrary polling

On Tue, Nov 7, 2017 at 10:07 AM, Wout Mertens <[hidden email]>
wrote:

> I'm working with a db that's only written to in transations, and each
> transaction increases a db-global version counter.
>
> This means that I can cache all reads, unless the version changed.
>
> What would be the most efficient way to make sure I *never* serve stale
> data?
>
> Right now everything's a single process, so it's really easy, just clear
> the cache on every write. However, I want to be prepared for the near
> future where I will have multiple processes using this db file.
>
> I'm thinking that to detect writes, this might be a safe approach:
>
> Before serving any cached read, check the timestamp on the wal file. If it
> changed, read the global version. If it changed, clear the cache.
> Otherwise, serve the cached read.
>
> Is it safe to assume that all writes would mean change of the wal file
> timestamp?
> More importantly, is it faster to check the timestamp or would a prepared
> query for the version actually be faster (and safer)?
>
> Also, I'm using WAL right now, but I wonder if that's really useful given
> the single-writer-at-a-time?
>
> Thank you for your insights,
>
> Wout.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Eduardo Morras-2
In reply to this post by wmertens
On Tue, 07 Nov 2017 18:07:42 +0000
Wout Mertens <[hidden email]> wrote:

> I'm working with a db that's only written to in transations, and each
> transaction increases a db-global version counter.
>
> This means that I can cache all reads, unless the version changed.
>
> What would be the most efficient way to make sure I *never* serve
> stale data?
>
> Right now everything's a single process, so it's really easy, just
> clear the cache on every write. However, I want to be prepared for
> the near future where I will have multiple processes using this db
> file.
>
> I'm thinking that to detect writes, this might be a safe approach:
>
> Before serving any cached read, check the timestamp on the wal file.
> If it changed, read the global version. If it changed, clear the
> cache. Otherwise, serve the cached read.
>
> Is it safe to assume that all writes would mean change of the wal file
> timestamp?
> More importantly, is it faster to check the timestamp or would a
> prepared query for the version actually be faster (and safer)?
>
> Also, I'm using WAL right now, but I wonder if that's really useful
> given the single-writer-at-a-time?

You can define triggers on insert, update and delete that fires a user defined function that warns your other threads or an external process (I use a similar setup on a AS400/DB2).
Something like this:

CREATE TRIGGER tg_night_watcher_insert AFTER INSERT ON table_name_to_watch_up
 BEGIN
  SELECT your_nigth_watcher();
 END

HTH

>
> Thank you for your insights,
>
> Wout.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


---   ---
Eduardo Morras <[hidden email]>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Jens Alfke-2
In reply to this post by Keith Medcalf


> On Nov 7, 2017, at 2:39 PM, Keith Medcalf <[hidden email]> wrote:
>
> What advantage does your third-level (application) cache provide that is not provided by the two lower level caches?

You’re being presumptuous here. It’s pretty common for presentation-level data to be very expensive to recompute; a common example is partial or full web pages. Rendering image, like graphs, can also be expensive. Or hey, how about the results of a complex SQL query?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Simon Slavin-3


On 7 Nov 2017, at 11:12pm, Jens Alfke <[hidden email]> wrote:

> On Nov 7, 2017, at 2:39 PM, Keith Medcalf <[hidden email]> wrote:
>
>> What advantage does your third-level (application) cache provide that is not provided by the two lower level caches?
>
> You’re being presumptuous here. It’s pretty common for presentation-level data to be very expensive to recompute; a common example is partial or full web pages. Rendering image, like graphs, can also be expensive. Or hey, how about the results of a complex SQL query?

It’s caching at two different levels.  The level Keith is talking about happens when you submit a SQL query and cache the data retrieved.  The level Jens is talking about happens when you submit a SQL query, process the data, then cache the results of the processing.

And it’s all highly dependent on the hardware, the amount of data and the type of processing.  And we don’t know whether the OP did premature optimisation.  So let’s forget this threadlet until and unless we have relevant data.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Dan Kennedy-4
In reply to this post by Simon Slavin-3
On 11/08/2017 02:41 AM, Simon Slavin wrote:

>
> On 7 Nov 2017, at 6:53pm, David Raymond <[hidden email]> wrote:
>
>> I think pragma data_version is what you're looking for.
>> http://www.sqlite.org/pragma.html#pragma_data_version
> An excellent solution, though there’s a /caveat/.  From the original post:
>
>> Right now everything's a single process, so it's really easy, just clear
>> the cache on every write. However, I want to be prepared for the near
>> future where I will have multiple processes using this db file.
> Documentation for the PRAGMA says
>
> " The "PRAGMA data_version" value is a local property of each database connection and so values returned by two concurrent invocations of "PRAGMA data_version" on separate database connections are often different even though the underlying database is identical."
>
> So when you convert your code to use multiple processes, they must all use the same connection for this to work properly.  That means you will have to invent a method to ensure that only one of them makes changes at a time.  That might not be what you wanted to do.


I think it's the opposite. For connection A, the value of "PRAGMA
data_version" does not change as a result of commits by connection A. It
changes if the db is modified by any other connection, regardless of
whether or not that other connection resides in a different process or not.

"The integer values returned by two invocations of "PRAGMA data_version"
from the same connection will be different if changes were committed to
the database by any other connection in the interim. The "PRAGMA
data_version" value is unchanged for commits made on the same database
connection."

Dan.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Most efficient way to detect on-disk change

Hick Gunter
In reply to this post by wmertens
There is also http://sqlite.org/pragma.html#pragma_user_version which more closely resembles what you have now.

I strongly suspect that an update cycle of the user_version should be done within the transaction performing the changes.

BEGIN
Read user version
<your changes go here>
Write updated user version
COMMIT


-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Wout Mertens
Gesendet: Dienstag, 07. November 2017 19:08
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] [sqlite] Most efficient way to detect on-disk change

I'm working with a db that's only written to in transations, and each transaction increases a db-global version counter.

This means that I can cache all reads, unless the version changed.

What would be the most efficient way to make sure I *never* serve stale data?

Right now everything's a single process, so it's really easy, just clear the cache on every write. However, I want to be prepared for the near future where I will have multiple processes using this db file.

I'm thinking that to detect writes, this might be a safe approach:

Before serving any cached read, check the timestamp on the wal file. If it changed, read the global version. If it changed, clear the cache.
Otherwise, serve the cached read.

Is it safe to assume that all writes would mean change of the wal file timestamp?
More importantly, is it faster to check the timestamp or would a prepared query for the version actually be faster (and safer)?

Also, I'm using WAL right now, but I wonder if that's really useful given the single-writer-at-a-time?

Thank you for your insights,

Wout.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Dominique Devienne
In reply to this post by Dan Kennedy-4
On Wed, Nov 8, 2017 at 7:45 AM, Dan Kennedy <[hidden email]> wrote:

> On 7 Nov 2017, at 6:53pm, David Raymond <[hidden email]> wrote:
>>
>> I think pragma data_version is what you're looking for.
>>> http://www.sqlite.org/pragma.html#pragma_data_version
>>>
>>
> I think it's the opposite. For connection A, the value of "PRAGMA
> data_version" does not change as a result of commits by connection A. It
> changes if the db is modified by any other connection, regardless of
> whether or not that other connection resides in a different process or not.
>
> "The integer values returned by two invocations of "PRAGMA data_version"
> from the same connection will be different if changes were committed to the
> database by any other connection in the interim. The "PRAGMA data_version"
> value is unchanged for commits made on the same database connection."


Hi Dan. So you confirm David's answer, provided OP also tracks change made
by the local connection, in addition to tracking pragma data_version?
I just want to make sure I understand your answer correctly. Thanks, --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

Dan Kennedy-4
On 11/08/2017 03:55 PM, Dominique Devienne wrote:

> On Wed, Nov 8, 2017 at 7:45 AM, Dan Kennedy <[hidden email]> wrote:
>
>> On 7 Nov 2017, at 6:53pm, David Raymond <[hidden email]> wrote:
>>> I think pragma data_version is what you're looking for.
>>>> http://www.sqlite.org/pragma.html#pragma_data_version
>>>>
>> I think it's the opposite. For connection A, the value of "PRAGMA
>> data_version" does not change as a result of commits by connection A. It
>> changes if the db is modified by any other connection, regardless of
>> whether or not that other connection resides in a different process or not.
>>
>> "The integer values returned by two invocations of "PRAGMA data_version"
>> from the same connection will be different if changes were committed to the
>> database by any other connection in the interim. The "PRAGMA data_version"
>> value is unchanged for commits made on the same database connection."
>
> Hi Dan. So you confirm David's answer, provided OP also tracks change made
> by the local connection, in addition to tracking pragma data_version?

That's right.

The original use case was an application-level cache of objects
associated with a single database connection. The cache should be
invalidated whenever the database is written. So the app would:

   a) invalidate the cache whenever it wrote to the db, and
   b) checked that "PRAGMA data_version" has not changed before using an
object from the cache (and invalidating the entire cache it if it had).

I guess the logic was that the app could implement more fine-grained
cache invalidation in (a) if required.

Dan.







> I just want to make sure I understand your answer correctly. Thanks, --DD
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Most efficient way to detect on-disk change

wmertens
Thank you all! I really have to do a thorough read of all the pragmas,
there are so many useful things in there! The user version sounds exactly
like what I should be using for storing the db version, and presumably the
data_version is a little faster still than reading the user version.

@Keith, the reason I'm caching is because the data itself is reasonably
small, and while I can query + parse the JSON in <2ms, using cached data
from memory takes microseconds…

Besides, memory is relatively cheap these days, and in general (safe)
caching is beneficial. I like reading about
https://en.wikipedia.org/wiki/Cache-oblivious_algorithm - any amount of
cache can improve performance with these…

On Wed, Nov 8, 2017 at 12:22 PM Dan Kennedy <[hidden email]> wrote:

> On 11/08/2017 03:55 PM, Dominique Devienne wrote:
> > On Wed, Nov 8, 2017 at 7:45 AM, Dan Kennedy <[hidden email]>
> wrote:
> >
> >> On 7 Nov 2017, at 6:53pm, David Raymond <[hidden email]>
> wrote:
> >>> I think pragma data_version is what you're looking for.
> >>>> http://www.sqlite.org/pragma.html#pragma_data_version
> >>>>
> >> I think it's the opposite. For connection A, the value of "PRAGMA
> >> data_version" does not change as a result of commits by connection A. It
> >> changes if the db is modified by any other connection, regardless of
> >> whether or not that other connection resides in a different process or
> not.
> >>
> >> "The integer values returned by two invocations of "PRAGMA data_version"
> >> from the same connection will be different if changes were committed to
> the
> >> database by any other connection in the interim. The "PRAGMA
> data_version"
> >> value is unchanged for commits made on the same database connection."
> >
> > Hi Dan. So you confirm David's answer, provided OP also tracks change
> made
> > by the local connection, in addition to tracking pragma data_version?
>
> That's right.
>
> The original use case was an application-level cache of objects
> associated with a single database connection. The cache should be
> invalidated whenever the database is written. So the app would:
>
>    a) invalidate the cache whenever it wrote to the db, and
>    b) checked that "PRAGMA data_version" has not changed before using an
> object from the cache (and invalidating the entire cache it if it had).
>
> I guess the logic was that the app could implement more fine-grained
> cache invalidation in (a) if required.
>
> Dan.
>
>
>
>
>
>
>
> > I just want to make sure I understand your answer correctly. Thanks, --DD
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users