Shared Cache vs. Serialized

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Shared Cache vs. Serialized

Jesse Rittner
I've been reading about the shared cache feature and I have a question. It
seems to me that there are two ways we can have multiple threads all
accessing a database via a "single" connection. One, establish a single
serialized private cache connection up front, and give it to all the
threads. Two, have each thread independently establish a multi-thread shared
cache connection. What are the trade-offs between these two approaches? (I
am using "serialized" and "multi-thread" as they are defined here:
https://www.sqlite.org/threadsafe.html)

Also, if my application only establishes a single connection to a particular
database, does it matter at all if that connection uses a shared or private
cache? In particular, I am establishing a single multi-thread connection
that I protect with my own mutex. My assumption is "no", but I just want to
confirm.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Shared Cache vs. Serialized

Keith Medcalf

The "normal" connection mode would be to have a separate connection for each thread, with no shared cache.  Each connection is opened with the FULLMUTEX (serialized) flags.  This is the default.  Each connection is fully isolated from every other connection.  Assuming that each "thread" has its own connection, each "thread" is isolated (database wise) from each other thread (by its different connection context).

You may "share" one connection amongst multiple threads provided that you "follow the entrance requirements".  Since database isolation is provided by the connection, multiple threads sharing the same connection share the same database context.

You may, if you must, optimize (ie, reduce) CPU usage by recovering the time spent executing the FULLMUTEX protections used in the default SERIALIZED mode by taking responsibility for this yourself and tell the SQLite3 library not to cover your ass but rather use YOUR OWN CODE or design to enforce the single-serial-entrance requirement of the underlying SQLite3 engine.  You will "recover" the time taken within the library to enforce this protection and instead you will spend that CPU time doing it yourself in your application, or will ensure the application design is such that violation of the rules is not possible (and choosing to perhaps forgo some operating system provided features that could "break" your design, such as DPC's or system managed thread pools in Winders).  However, the amount of CPU time saved is small and the added complication is large.  In other words you would do this only if it is cost effective and each nanosecond matters (though it would probably be more cost effective to just buy better hardware).

On the other hand, you may need to optimize memory usage at the expense of more CPU usage.  You do this by enabling "shared cache" so that the multiple connections "share" the same cache thus reducing memory used by multiple connections (since they now have one cache shared amongst them rather than one cache per connection).  This might permit an otherwise well designed application that uses multiple connections and threads to run on a device which is  memory constrained by trading memory for CPU (reduce memory increase CPU), after determining that you cannot fix the defect by simply adding more memory (which is likely far more cost efficient).  Since you are optimizing memory usage (ie, reducing it) by deliberately trading it for CPU (which will be increased to make up for the lack of memory) you have already decided that you do not want/need to optimize CPU usage (so there are pretty much no options here to optimize CPU usage).

Using a single connection in shared cache mode (vs that same one connection in non-shared cache mode) does nothing to optimize memory (you still have one cache being used for your one connection) but does still use all the extra CPU processing associated with managing the shared cache, because that shared cache still has to be managed even though you are only using one "shared cache" connection at the moment.

Therefore, one probably ought to use the default (one connection per thread, non-shared cache, in serialized mode) and optimize IF AND ONLY IF it is found  that optimization is required.  Otherwise, one might find that they have prematurely optimized for the wrong thing and then spend inordinate amounts of time correcting the error (or griping about it on the mailing list).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Jesse Rittner
>Sent: Sunday, 30 December, 2018 08:53
>To: [hidden email]
>Subject: [sqlite] Shared Cache vs. Serialized
>
>I've been reading about the shared cache feature and I have a
>question. It
>seems to me that there are two ways we can have multiple threads all
>accessing a database via a "single" connection. One, establish a
>single
>serialized private cache connection up front, and give it to all the
>threads. Two, have each thread independently establish a multi-thread
>shared
>cache connection. What are the trade-offs between these two
>approaches? (I
>am using "serialized" and "multi-thread" as they are defined here:
>https://www.sqlite.org/threadsafe.html)
>
>Also, if my application only establishes a single connection to a
>particular
>database, does it matter at all if that connection uses a shared or
>private
>cache? In particular, I am establishing a single multi-thread
>connection
>that I protect with my own mutex. My assumption is "no", but I just
>want to
>confirm.
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Shared Cache vs. Serialized

Jens Alfke-2
In reply to this post by Jesse Rittner


> On Dec 30, 2018, at 7:52 AM, Jesse Rittner <[hidden email]> wrote:
>
> One, establish a single
> serialized private cache connection up front, and give it to all the
> threads. Two, have each thread independently establish a multi-thread shared
> cache connection. What are the trade-offs between these two approaches?

Be careful — both of these approaches lack isolation (the I in ACID) between the threads.

If you share a connection, then if one thread begins a transaction, all the other threads of course share in that transaction. In other words, thread B can see intermediate uncommitted writes from an ongoing transaction in thread A. If you don’t want this (and it tends to be not what you want in a typical GUI application) you’ll have to set up your own mutex to lock out all other threads during a transaction. But this reduces availability.

If you use shared-cache mode, it _seems_ as though your threads should be isolated since they have separate connections. But since the connections are sharing a cache, they still end up seeing uncommitted writes. (Caveat: This is from memory and may be wrong. I briefly turned on shared-cache mode a few years ago, and had to back it out a month later after a series of weird application bugs. IIRC, this is the behavior that led to the bugs. There is a web page explaining this somewhere on the site.)

That’s not to say you can’t use these approaches, but you have to design your code such that a transaction is global, not per-thread.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Shared Cache vs. Serialized

Jens Alfke-2
In reply to this post by Jesse Rittner


> On Dec 30, 2018, at 7:52 AM, Jesse Rittner <[hidden email]> wrote:
>
> It
> seems to me that there are two ways we can have multiple threads all
> accessing a database via a "single" connection.

Actually there is at least one more way: Create a global connection pool. When a thread needs to access the database, it borrows a connection from the pool, does its work, and then returns it back to the pool. This approach even preserves isolation, as long as a thread that opens a transaction doesn’t return the connection until after it commits.

If a thread wants a connection but the pool is empty, it can either open a new connection or it can block until another thread returns one. It’s a tradeoff between availability and memory usage, basically.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Shared Cache vs. Serialized

Jesse Rittner
Jens Alfke-2 wrote
> But since the connections are sharing a cache, they still end up seeing
> uncommitted writes.

I believe this is managed by the  read_uncommitted pragma
<https://www.sqlite.org/pragma.html#pragma_read_uncommitted>  .

Another interesting distinction between shared and private cache mode I
found while experimenting. Ordinarily if connection A is in the middle of
fetching result rows from a SELECT (i.e., sqlite3_step was called, but not
sqlite3_reset), and connection B tries to do a CREATE/UPDATE/DELETE, B will
get back SQLITE_BUSY. However, with a shared cache, B's operation will
immediately succeed, provided it doesn't affect the same table(s).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Shared Cache vs. Serialized

Keith Medcalf

>Another interesting distinction between shared and private cache mode
>I found while experimenting. Ordinarily if connection A is in the
>middle of fetching result rows from a SELECT (i.e., sqlite3_step was called,
>but not sqlite3_reset), and connection B tries to do a CREATE/UPDATE/DELETE,
>B will get back SQLITE_BUSY. However, with a shared cache, B's operation
>will immediately succeed, provided it doesn't affect the same table(s).

See also Write Ahead Logging journal mode (WAL) where in the case you specified B's operation will succeed even if it is updating a table being read by connection A.

https://sqlite.org/wal.html

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Shared Cache vs. Serialized

Jesse Rittner
Keith Medcalf wrote
> See also Write Ahead Logging journal mode (WAL) where in the case you
> specified B's operation will succeed even if it is updating a table being
> read by connection A.

Oh, good to know! For anyone who stumbles across this thread in the future,
note that this requires that you be using separate private caches between
connections A and B. With a shared cache, you still get SQLITE_LOCKED even
with write-ahead logging.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users