Quantcast

Bitten by lack of isolation between SELECT and UPDATE on the same connection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Bitten by lack of isolation between SELECT and UPDATE on the same connection

Jens Alfke-2
I’ve just run headlong in to the issues described in "No Isolation Between Operations On The Same Database Connection”. Specifically, I’ve discovered (after some debugging) that if I iterate over the the rows in a table using sqlite3_step, and update each row after it’s returned, Bad Stuff happens. Specifically, my query is just getting the first row over and over and over again, and the iteration runs forever. :(

I had been under the impression that, since I’m using the WAL, queries operate on a snapshot of the database as of the time they begin, and are unaffected by subsequent changes. I got this from reading about "snapshot isolation” in a previous section of that document. (Also, another key/value database engine I’ve used recently _does_ behave this way, so it’s what I was expecting.) I now see that the “read transaction” described in that section has to be occurring in a different connection than the write transaction. (Right?)

I’m unsure what to do now. I am working on a library whose API exposes iterator objects that run queries; the iterator’s “next()” method internally calls sqlite3_step. Thus the interleaving of the query and updating the database is not under my control; it’s up to the developer using our library, and I do _not_ want to expose inconvenient undefined behavior like this, or tell developers that “you can’t modify the database while you’re iterating it”.

I can’t be the first person to run into this. Is there a best practice for enabling concurrent iteration and mutation? I can think of two solutions:

A. Batch up all of the query results in memory at the start of the iteration, and have the iterator just read them out of the in-memory list.
        I’d like to avoid this because of the obvious memory overhead and latency imposed on large queries. Version 1 of our library worked this way, which is why I probably hadn’t noticed the problem until now.

B. Create a separate SQLite connection for the query; then it’ll be isolated from any changes being made in the main connection.
        This seems elegant, but it will of course use more memory for the extra connection (with its own cache.) Moreover, it seems like I’ll need to open an indefinite number of extra connections: if the caller starts a query, makes some changes, and then starts another query (before reading the final row of the first query), I need to open another connection for the second query because it has to see the changes, which aren’t yet visible in the first query's connection … right?

—Jens

[1]: https://www.sqlite.org/isolation.html
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Rowan Worth-2
The iterator pattern has another caveat when applied to sqlite:

  foreach (row in statement) {
      if (isMatch(row)) {
          return true
      }
  }
  return false

If the iterator isn't exhausted, how do you know when to dispose the
sqlite3_stmt? There are other ways to manage the statement's lifetime so
this isn't a deal breaker, just something to keep in mind.


I know that our code base uses step/UPDATE/step/UPDATE/... in a couple of
places, without problems. But I guess that is just luck; as you note the
documentation clearly says the behaviour is undefined. To repeatedly get
the _same_ row over and over seems incredibly unfortunate though!

-Rowan


On 31 January 2017 at 11:29, Jens Alfke <[hidden email]> wrote:

> I’ve just run headlong in to the issues described in "No Isolation Between
> Operations On The Same Database Connection”. Specifically, I’ve discovered
> (after some debugging) that if I iterate over the the rows in a table using
> sqlite3_step, and update each row after it’s returned, Bad Stuff happens.
> Specifically, my query is just getting the first row over and over and over
> again, and the iteration runs forever. :(
>
> I had been under the impression that, since I’m using the WAL, queries
> operate on a snapshot of the database as of the time they begin, and are
> unaffected by subsequent changes. I got this from reading about "snapshot
> isolation” in a previous section of that document. (Also, another key/value
> database engine I’ve used recently _does_ behave this way, so it’s what I
> was expecting.) I now see that the “read transaction” described in that
> section has to be occurring in a different connection than the write
> transaction. (Right?)
>
> I’m unsure what to do now. I am working on a library whose API exposes
> iterator objects that run queries; the iterator’s “next()” method
> internally calls sqlite3_step. Thus the interleaving of the query and
> updating the database is not under my control; it’s up to the developer
> using our library, and I do _not_ want to expose inconvenient undefined
> behavior like this, or tell developers that “you can’t modify the database
> while you’re iterating it”.
>
> I can’t be the first person to run into this. Is there a best practice for
> enabling concurrent iteration and mutation? I can think of two solutions:
>
> A. Batch up all of the query results in memory at the start of the
> iteration, and have the iterator just read them out of the in-memory list.
>         I’d like to avoid this because of the obvious memory overhead and
> latency imposed on large queries. Version 1 of our library worked this way,
> which is why I probably hadn’t noticed the problem until now.
>
> B. Create a separate SQLite connection for the query; then it’ll be
> isolated from any changes being made in the main connection.
>         This seems elegant, but it will of course use more memory for the
> extra connection (with its own cache.) Moreover, it seems like I’ll need to
> open an indefinite number of extra connections: if the caller starts a
> query, makes some changes, and then starts another query (before reading
> the final row of the first query), I need to open another connection for
> the second query because it has to see the changes, which aren’t yet
> visible in the first query's connection … right?
>
> —Jens
>
> [1]: https://www.sqlite.org/isolation.html
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Simon Slavin-3
In reply to this post by Jens Alfke-2

On 31 Jan 2017, at 3:29am, Jens Alfke <[hidden email]> wrote:

> I’ve discovered (after some debugging) that if I iterate over the the rows in a table using sqlite3_step, and update each row after it’s returned, Bad Stuff happens. Specifically, my query is just getting the first row over and over and over again, and the iteration runs forever

Is your UPDATE command changing a value which is used for the SELECT ?  If so, what you reported is expected behaviour.  If you’re expecting to execute two statements at the same time you should be using two connections.

> I’m unsure what to do now. I am working on a library whose API exposes iterator objects that run queries; the iterator’s “next()” method internally calls sqlite3_step.

Nope.  Cannot do that.  Any number of things might happen between the first _step() and the _finalize().  For all you know someone might delete the object the iterator is currently on instead of just updating it.  Then where would the iterator be ?  How would you know to release the resource ?  And once you knew it, how would you do it ?

One solution is to make each call to .next() do its own SELECT.  So if, for example, it was acceptable to iterate the rows in rowid order then calling .next() would do

SELECT rowid FROM MyTable WHERE rowid > [current rowid] ORDER BY rowid LIMIT 1

If this gives a row, that’s your next object.  If it doesn’t, you’ve reached the end.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Jens Alfke-2
In reply to this post by Rowan Worth-2

> On Jan 30, 2017, at 8:03 PM, Rowan Worth <[hidden email]> wrote:
>
> If the iterator isn't exhausted, how do you know when to dispose the
> sqlite3_stmt?

The iterator (which is an Objective-C NSEnumerator object) will be deleted shortly after it exits scope. Some of the refcounting is deferred via the autorelease pool, but basically by the time the thread returns back to its event loop.

—Jens

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Jens Alfke-2
In reply to this post by Simon Slavin-3

> On Jan 30, 2017, at 9:10 PM, Simon Slavin <[hidden email]> wrote:
>
> Nope.  Cannot do that.  Any number of things might happen between the first _step() and the _finalize().  For all you know someone might delete the object the iterator is currently on instead of just updating it.  Then where would the iterator be ?  

As I explained, my assumption was that the iteration operated on a snapshot of the database at the time it was started, i.e. at the first call to sqlite3_step. There are other databases that operate that way, although I now understand SQLite doesn’t.

I understand the situation. And I outlined two ways around the problem. So this isn’t a blanket “Cannot do that” situation, unless you’re saying I can’t do it the way I’ve been dong it … but I already know that! That’s why I posted my question. Reiterating it doesn’t help.

> How would you know to release the resource ?  And once you knew it, how would you do it ?

I don’t follow. What’s the “resource” you’re talking about here?

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Simon Slavin-3

On 31 Jan 2017, at 5:26am, Jens Alfke <[hidden email]> wrote:

> I don’t follow. What’s the “resource” you’re talking about here?

In your case, the NSEnumerator .

Would the solution I proposed in my post work for you ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Hick Gunter
In reply to this post by Jens Alfke-2
Maybe adding "order by rowid" to your select statement can help avoid "sawing off the branch you are sitting on". Unless you need to update rowids...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Jens Alfke
Gesendet: Dienstag, 31. Jänner 2017 04:30
An: SQLite mailing list <[hidden email]>
Betreff: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

I’ve just run headlong in to the issues described in "No Isolation Between Operations On The Same Database Connection”. Specifically, I’ve discovered (after some debugging) that if I iterate over the the rows in a table using sqlite3_step, and update each row after it’s returned, Bad Stuff happens. Specifically, my query is just getting the first row over and over and over again, and the iteration runs forever. :(

I had been under the impression that, since I’m using the WAL, queries operate on a snapshot of the database as of the time they begin, and are unaffected by subsequent changes. I got this from reading about "snapshot isolation” in a previous section of that document. (Also, another key/value database engine I’ve used recently _does_ behave this way, so it’s what I was expecting.) I now see that the “read transaction” described in that section has to be occurring in a different connection than the write transaction. (Right?)

I’m unsure what to do now. I am working on a library whose API exposes iterator objects that run queries; the iterator’s “next()” method internally calls sqlite3_step. Thus the interleaving of the query and updating the database is not under my control; it’s up to the developer using our library, and I do _not_ want to expose inconvenient undefined behavior like this, or tell developers that “you can’t modify the database while you’re iterating it”.

I can’t be the first person to run into this. Is there a best practice for enabling concurrent iteration and mutation? I can think of two solutions:

A. Batch up all of the query results in memory at the start of the iteration, and have the iterator just read them out of the in-memory list.
        I’d like to avoid this because of the obvious memory overhead and latency imposed on large queries. Version 1 of our library worked this way, which is why I probably hadn’t noticed the problem until now.

B. Create a separate SQLite connection for the query; then it’ll be isolated from any changes being made in the main connection.
        This seems elegant, but it will of course use more memory for the extra connection (with its own cache.) Moreover, it seems like I’ll need to open an indefinite number of extra connections: if the caller starts a query, makes some changes, and then starts another query (before reading the final row of the first query), I need to open another connection for the second query because it has to see the changes, which aren’t yet visible in the first query's connection … right?

—Jens

[1]: https://www.sqlite.org/isolation.html
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

James K. Lowden
In reply to this post by Jens Alfke-2
On Mon, 30 Jan 2017 19:29:40 -0800
Jens Alfke <[hidden email]> wrote:

> if I iterate over the the rows in a table using sqlite3_step, and
> update each row after it?s returned, Bad Stuff happens. Specifically,
> my query is just getting the first row over and over and over again,
> and the iteration runs forever.  

I think you've solved your immediate problem and come to grips with
SQLite's behavior.  I thought it might be helpful to explain why it
works that way, and that what you want is also valid, but goes by
another name: a cursor.  

According the SQL standard, every SQL statement is atomic.  SELECT has
no beginning and no end: the results it returns reflect the state of
the database as of the moment the statement was executed.  If you fetch
the last row six days after the first, it still belongs to the database
as it stood when you began.  

SQLite in WAL mode gives you that isolation.  You weren't bitten by a
*lack* of isolation; you were bitten by isolation you didn't expect.  

The idea of updating rows as they are read -- without completing the
transaction -- is supported in SQL with a cursor.  Standard SQL has
DECLARE CURSOR syntax; some cursors can be declared as FOR UPDATE and
have behavior much like what you expected.  That syntax, as you know, is
not supported by SQLite.  

Technically speaking the product of ORDER BY is also a cursor, and it's
not hard to find references to sqlite3_step as using "a cursor".  All
meaning stands in context, and those should not be confused with an SQL
cursor.  

--jkl


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Jens Alfke-2

> On Jan 31, 2017, at 9:39 AM, James K. Lowden <[hidden email]> wrote:
>
> According the SQL standard, every SQL statement is atomic.  SELECT has
> no beginning and no end: the results it returns reflect the state of
> the database as of the moment the statement was executed.  If you fetch
> the last row six days after the first, it still belongs to the database
> as it stood when you began.  

That is the behavior I was assuming and desiring, but it’s not what actually occurs. If there are concurrent mutations in the same connection, the rows returned by SELECT do _not_ reflect the prior state of the database, but suffer from “undefined” behavior. In other words, there is a lack of isolation between the SELECT and the concurrent UPDATEs.

It’s possible I’m misunderstanding your point, though!

My immediate workaround (implemented last night) is to iterate over the statement at the moment the query is run, saving all the rows in memory. Our enumerator object then just reads and returns successive rows from that list.

In the medium term I have ideas for optimizations that can let us avoid this memory hit in most circumstances (since most queries are not made at the same time as mutations.) For example, I could use the original enumerator behavior by default, but when the client requests a mutation I first notify all in-progress enumerators [on that connection], which will immediately read the rest of their rows into memory.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Richard Hipp-3
On 1/31/17, Jens Alfke <[hidden email]> wrote:
>
> My immediate workaround (implemented last night) is to iterate over the
> statement at the moment the query is run, saving all the rows in memory. Our
> enumerator object then just reads and returns successive rows from that
> list.

That's how client/server SQL database engines do it.  They run the
query to completion before starting on any of the updates.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bitten by lack of isolation between SELECT and UPDATE on the same connection

Scott Robison-2
In reply to this post by Jens Alfke-2
On Tue, Jan 31, 2017 at 12:15 PM, Jens Alfke <[hidden email]> wrote:

>
> > On Jan 31, 2017, at 9:39 AM, James K. Lowden <[hidden email]>
> wrote:
> >
> > According the SQL standard, every SQL statement is atomic.  SELECT has
> > no beginning and no end: the results it returns reflect the state of
> > the database as of the moment the statement was executed.  If you fetch
> > the last row six days after the first, it still belongs to the database
> > as it stood when you began.
>
> That is the behavior I was assuming and desiring, but it’s not what
> actually occurs. If there are concurrent mutations in the same connection,
> the rows returned by SELECT do _not_ reflect the prior state of the
> database, but suffer from “undefined” behavior. In other words, there is a
> lack of isolation between the SELECT and the concurrent UPDATEs.
>
> It’s possible I’m misunderstanding your point, though!
>
> My immediate workaround (implemented last night) is to iterate over the
> statement at the moment the query is run, saving all the rows in memory.
> Our enumerator object then just reads and returns successive rows from that
> list.
>
> In the medium term I have ideas for optimizations that can let us avoid
> this memory hit in most circumstances (since most queries are not made at
> the same time as mutations.) For example, I could use the original
> enumerator behavior by default, but when the client requests a mutation I
> first notify all in-progress enumerators [on that connection], which will
> immediately read the rest of their rows into memory.
>

I think you said something earlier about a fear that the record set might
be too big to fit in memory (or wanting to avoid that possibility). You
could select the record set you want to a temp table then select *that*
while running updates on the original tables. Probably something you
already thought of (or maybe I subconsciously read it from someone else
already; sorry if adding noise), but thought I'd toss it out.

SDR
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...