Endless loop possible with simultaneous SELECT and UPDATE?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Endless loop possible with simultaneous SELECT and UPDATE?

Alexander Vega
 Having read :  https://www.sqlite.org/isolation.html
Specifically the line "And the application can UPDATE the current row or
any prior row, though doing so might cause that row to reappear in a
subsequent sqlite3_step()."

Is it possible to create and endless loop with the following (pseudo)code?

#define SELECT_EXPIRE_INFO \
"SELECT Auth_id, expiration FROM AuthTable;"

#define UPDATE_SESID_EXPIRED \
"UPDATE AuthTable SET sesCookie=?, expiration=? WHERE Auth_id=?;"

static void
expire_sesid(void)
{

/* get raw current time */
current_time = get_current_db_time(false);

/* prepare SQL queries */
sqlite3_prepare_v2(db,  SELECT_EXPIRE_INFO,
-1, &expire_info, NULL);
sqlite3_prepare_v2(db,  UPDATE_SESID_EXPIRED,
-1, &update_ses_expired, NULL);

/* while there is work to be done */
while (sqlite3_step(expire_info) == SQLITE_ROW) {
    auth_id = sqlite3_column_int(expire_info, 0);   /* auth_id */
    expiration_time = sqlite3_column_int64(expire_info, 1);   /* expiration
*/

    /* if the session is expired, today is greater than expiration date */
    if ( current_time >  expiration_time ) {
      /* generate new invalid session id */
      generate_ses_id(ses_id);
      /* invalidate ses_id and set internal expiration to a year ahead, log
in
      will set it to a month for user log in */
      sqlite3_bind_text(update_ses_expired, 1, ses_id,
             16, SQLITE_STATIC);
      sqlite3_bind_int64(update_ses_expired, 2, current_time_plus_year);
      sqlite3_bind_int(update_ses_expired, 3, auth_id);
      sqlite3_step(update_ses_expired);
      sqlite3_reset(update_ses_expired);
      }
  }
/* all work has completed */
sqlite3_finalize(expire_info);
sqlite3_finalize(update_ses_expired);
return;
}

I appreciate everyone's time,
-Alex V
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Endless loop possible with simultaneous SELECT and UPDATE?

Keith Medcalf

> Having read :  https://www.sqlite.org/isolation.html
> Specifically the line "And the application can UPDATE the current row
> or any prior row, though doing so might cause that row to reappear in a
> subsequent sqlite3_step()."

> Is it possible to create and endless loop

Eventually you will have no more rows to update and therefore the underlying structures become stable and the select loop will eventually run out of rows, though you may or may not have visited all rows, and may visit some rows two or more times (once before update and more than once after).

If you change the outer query to "select auth_id, expiration from AuthTable order by +auth_id, +expiration" then you will PROBABLY never have a problem since the results will LIKELY be from a sorter and not from the underlying table, and therefore mutation of the underlying tables and indexes will not interfere with the result of the outer select, even if those mutations affect the AuthTable or the indexes on it.  Some SQL varients use a FOR UPDATE clause on a SELECT to tell the query planner that you intend to dally-about with the underlying datastore without having the proper isolation in place.  The SQLite way of doing this is by requesting a row sorter not dependent on indexes by using the +columnname syntax in an order by on the select.

Your outer query should probably be "select auth_id, expiration from AuthTable where expiration <= ? order by +auth_id, +expiration" and binding current_time as the parameter since there is no point in retrieving rows that you will not be updating is there?

The correct solution is, of course, to use separate connections so that you have isolation between the select and the updates.  

You SHOULD be executing the outer select on one connection and the updates on another connection.  This will work for journal mode delete unless the number of changed pages is too large to fit in sqlite's cache, in which case you may get an error from the update statement when it needs to spill the cache, and you will need to kaibosh the whole thing and do the updates in smaller chunks by putting a limit on the outer select and looping the whole thing until there are no more rows to process.  (or increase the cache_size to be sufficient).

You can avoid that particular problem by having the database in journal_mode=WAL in which case you can even process each update in its own transaction if you wish (get rid of the db2.beginimmediate() and db2.commit(), though then you will have to handle the eventuality of getting errors on the UPDATE).

db1 = Connection('database.db')
db1.executescript('pragma journal_mode=WAL;')
db2 = Connection('database.db')
current_time = datetime.now()
current_time_plus_one_year = current_time.add(years=1)
sess_id = ... some constant ...
db2.beginimmediate()
for row in db1.execute('select auth_id, expiration from authtable where expiration <= ?;',
                       (current_time,)):
    db2.execute('update authtable set sesCookie = ?, expiration = ? where auth_id = ?;',
                (generate_ses_id(sess_id), current_time_plus_one_year, row.auth_id,))
db2.commit()

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Endless loop possible with simultaneous SELECT and UPDATE?

Alexander Vega
Thank you Keith for your answer. It has led me to more questions.

"though you may or may not have visited all rows"
From the documentation I did not get the impression that you would ever not
visit ALL ROWS at least once. Is there a technical reason for this? I would
assume a full table scan is walking the un-ordered leaf pages of the B*
tree?

"Your outer query should probably be "select auth_id, expiration from
AuthTable where expiration <= ? order by +auth_id, +expiration" and binding
current_time as the parameter since there is no point in retrieving rows
that you will not be updating is there?  "
You are correct that does make sense. I guess I was trying avoid any
ambiguities of a WHERE clause on the SELECT because I do not understand its
behavior in this circumstance.

You mentioned two database connections to the same database. Is this going
to work if I am using Threadsafe mode = 0? Would the second connection be
done through an attach?

Does this conversation change if I wrap the whole select and updates in one
transaction? e.g. BEGIN...END

Thanks



On Sun, Sep 1, 2019 at 1:32 AM Keith Medcalf <[hidden email]> wrote:

>
> > Having read :  https://www.sqlite.org/isolation.html
> > Specifically the line "And the application can UPDATE the current row
> > or any prior row, though doing so might cause that row to reappear in a
> > subsequent sqlite3_step()."
>
> > Is it possible to create and endless loop
>
> Eventually you will have no more rows to update and therefore the
> underlying structures become stable and the select loop will eventually run
> out of rows, though you may or may not have visited all rows, and may visit
> some rows two or more times (once before update and more than once after).
>
> If you change the outer query to "select auth_id, expiration from
> AuthTable order by +auth_id, +expiration" then you will PROBABLY never have
> a problem since the results will LIKELY be from a sorter and not from the
> underlying table, and therefore mutation of the underlying tables and
> indexes will not interfere with the result of the outer select, even if
> those mutations affect the AuthTable or the indexes on it.  Some SQL
> varients use a FOR UPDATE clause on a SELECT to tell the query planner that
> you intend to dally-about with the underlying datastore without having the
> proper isolation in place.  The SQLite way of doing this is by requesting a
> row sorter not dependent on indexes by using the +columnname syntax in an
> order by on the select.
>
> Your outer query should probably be "select auth_id, expiration from
> AuthTable where expiration <= ? order by +auth_id, +expiration" and binding
> current_time as the parameter since there is no point in retrieving rows
> that you will not be updating is there?
>
> The correct solution is, of course, to use separate connections so that
> you have isolation between the select and the updates.
>
> You SHOULD be executing the outer select on one connection and the updates
> on another connection.  This will work for journal mode delete unless the
> number of changed pages is too large to fit in sqlite's cache, in which
> case you may get an error from the update statement when it needs to spill
> the cache, and you will need to kaibosh the whole thing and do the updates
> in smaller chunks by putting a limit on the outer select and looping the
> whole thing until there are no more rows to process.  (or increase the
> cache_size to be sufficient).
>
> You can avoid that particular problem by having the database in
> journal_mode=WAL in which case you can even process each update in its own
> transaction if you wish (get rid of the db2.beginimmediate() and
> db2.commit(), though then you will have to handle the eventuality of
> getting errors on the UPDATE).
>
> db1 = Connection('database.db')
> db1.executescript('pragma journal_mode=WAL;')
> db2 = Connection('database.db')
> current_time = datetime.now()
> current_time_plus_one_year = current_time.add(years=1)
> sess_id = ... some constant ...
> db2.beginimmediate()
> for row in db1.execute('select auth_id, expiration from authtable where
> expiration <= ?;',
>                        (current_time,)):
>     db2.execute('update authtable set sesCookie = ?, expiration = ? where
> auth_id = ?;',
>                 (generate_ses_id(sess_id), current_time_plus_one_year,
> row.auth_id,))
> db2.commit()
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Endless loop possible with simultaneous SELECT and UPDATE?

Keith Medcalf

On Sunday, 1 September, 2019 11:12, Alexander Vega <[hidden email]> wrote:

>Thank you Keith for your answer. It has led me to more questions.

>"though you may or may not have visited all rows"
>From the documentation I did not get the impression that you would
>ever not visit ALL ROWS at least once. Is there a technical reason
>for this? I would assume a full table scan is walking the un-ordered
>leaf pages of the B*tree?

How do you know that you are doing a table scan?  This certainly cannot be assumed.  Perhaps the AuthTable has 57 columns with a total length of several hundred bytes per row but there also happens to be an index on a subset of the columns that includes the two columns that you have asked for.  Perhaps you are "table scanning" that covering index instead (because it is cheaper than reading the actual table)?  There are ways to insist on a table scan (select ... from table NOT INDEXED ...) for example.  However, you left it up to the database engine to choose the most cost effective way to answer your select (which is how SQL works ... it is a declarative language ... you declare what you want and the database figures out the best way to go about giving you what you asked for).

As a result of updating the first such row thus received the index has now changed such that the row you are operating on became the last row in the index being scanned.  Therefore there is no "next" row.  You will have visited only one row, even though there might have been millions of rows in the table.

>"Your outer query should probably be "select auth_id, expiration from
>AuthTable where expiration <= ? order by +auth_id, +expiration" and
>binding current_time as the parameter since there is no point in
>retrieving rows that you will not be updating is there?  "

>You are correct that does make sense. I guess I was trying avoid any
>ambiguities of a WHERE clause on the SELECT because I do not
>understand its behavior in this circumstance.

If you cannot understand the behaviour with a WHERE clause, then what would make you think that one without a WHERE clause would be any more transparent, especially given that all Relational Databases are designed to provide you the results you asked for as efficiently as possible?  Perhaps in a few days you will discover that you need to create another index for some other purpose, and that causes SQLite3 to obtain what you said you wanted in an entirely different manner.  When you make any change to the database do you re-evaluate the implementation details of every previously written SQL statement to see if it still compatible with the details you depended on?  What about it you update the version of SQLite3?  You should not be dependent on the peculiarities of the implementation since they might change at any time.

>You mentioned two database connections to the same database. Is this
>going to work if I am using Threadsafe mode = 0?

Yes.  Threadsafe mode only affects programs having multiple threads making calls into the sqlite3 library.  These are independent variables (that is you can have X threads and Y connections, and X is independent of Y) just because you have 47 connections does not mean that you have more than 1 thread, nor does having 47 threads mean that you have more than 1 connection.  Threads are commenced with _beginthread (or equivalent for the OS) calls and connections are commenced with sqlite3_open* calls.  The _beginthread operations result in the creation of a thread and the sqlite3_open* calls create a database connection -- they are not related to each other in any way.  Also consider that it is entirely possible for a program to have hundreds of threads yet still only be single-threaded as far as sqlite3 is concerned if only one of those threads makes use of the sqlite3 library, and that one thread may use hundreds of database connections either serially or in parallel or in some combination thereof.

>Would the second connection be done through an attach?

No.  The attach statement attaches a database to a connection.  You have to have opened the connection first.  Connections are created with the sqlite3_open* functions which return a pointer to a database connection.

>Does this conversation change if I wrap the whole select and updates
>in one transaction? e.g. BEGIN...END

No, because isolation is only BETWEEN connections, not WITHIN connections.  And the transaction state is per connection.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Endless loop possible with simultaneous SELECT and UPDATE?

Alexander Vega
So in the original code if I added a NOT INDEXED it would be valid? Also,
would an ORDER BY Auth_id ASC fix the issue, since I an not adding any new
rows the auth_ids would remain constant?

Wow I did not know that you could call open multiple times on the same
database! So the following is valid and safe (considering POSIX locking
etc.)
sqlite3_open("database1", &db1);
sqlite3_open("database1", &db2);
// loop start
// BEGIN TRANSACTION BOTH dbs
// do SELECT on db1
// do UPDATE on db2
// END TRANSACTION
// loop end
// close db1
// use db2 for rest of the program...

I have not seen this pattern within any sqlite3 code before so I think it
is not well known that this is a design pattern for searching and updating
at the same time.

If I were determined to stay within defined behavior AND to only use one
connection, would the following updated code attain that.

#define SELECT_EXPIRED_IDS \
"SELECT Auth_id FROM AuthTable WHERE Auth_id > ? AND expiration < ? " \
"ORDER BY Auth_id ASC LIMIT 128;"

#define UPDATE_SESID_EXPIRED \
"UPDATE AuthTable SET sesCookie=?, expiration=? WHERE Auth_id=?;"

static void
expire_sesid(void)
{
int auth_ids[128] = {0};
int i=0, j;
/* get raw current time */
current_time = get_current_db_time(false);

/* prepare SQL queries */
sqlite3_prepare_v2(db,   SELECT_EXPIRED_IDS,
-1, &expire_info, NULL);
sqlite3_prepare_v2(db,  UPDATE_SESID_EXPIRED,
-1, &update_ses_expired, NULL);

do {
  sqlite3_bind_int(expire_info, 1, auth_ids[i]);
  sqlite3_bind_int(expire_info, 2, current_time);
  /* while there is work to be done */
  i=0;
  while (sqlite3_step(expire_info) == SQLITE_ROW) {
    auth_ids[i++] = sqlite3_column_int(expire_info, 0);   /* auth_id */
  }
  sqlite3_reset(expire_info);

  for (j=0; j < i; j++) {
    /* generate new session id */
    generate_ses_id(ses_id);
    /* update ses_id and internal expiration to a year ahead, log in
    will set it to a month for user log in */
    sqlite3_bind_text(update_ses_expired, 1, ses_id,
           16, SQLITE_STATIC);
    sqlite3_bind_int64(update_ses_expired, 2, current_time_plus_year);
    sqlite3_bind_int(update_ses_expired, 3, auth_ids[j]);
    sqlite3_step(update_ses_expired);
    sqlite3_reset(update_ses_expired);
  }
} while ( i == 128 );
/* all work has completed */
sqlite3_finalize(expire_info);
sqlite3_finalize(update_ses_expired);
return;
}




On Sun, Sep 1, 2019 at 3:47 PM Keith Medcalf <[hidden email]> wrote:

>
> On Sunday, 1 September, 2019 11:12, Alexander Vega <[hidden email]>
> wrote:
>
> >Thank you Keith for your answer. It has led me to more questions.
>
> >"though you may or may not have visited all rows"
> >From the documentation I did not get the impression that you would
> >ever not visit ALL ROWS at least once. Is there a technical reason
> >for this? I would assume a full table scan is walking the un-ordered
> >leaf pages of the B*tree?
>
> How do you know that you are doing a table scan?  This certainly cannot be
> assumed.  Perhaps the AuthTable has 57 columns with a total length of
> several hundred bytes per row but there also happens to be an index on a
> subset of the columns that includes the two columns that you have asked
> for.  Perhaps you are "table scanning" that covering index instead (because
> it is cheaper than reading the actual table)?  There are ways to insist on
> a table scan (select ... from table NOT INDEXED ...) for example.  However,
> you left it up to the database engine to choose the most cost effective way
> to answer your select (which is how SQL works ... it is a declarative
> language ... you declare what you want and the database figures out the
> best way to go about giving you what you asked for).
>
> As a result of updating the first such row thus received the index has now
> changed such that the row you are operating on became the last row in the
> index being scanned.  Therefore there is no "next" row.  You will have
> visited only one row, even though there might have been millions of rows in
> the table.
>
> >"Your outer query should probably be "select auth_id, expiration from
> >AuthTable where expiration <= ? order by +auth_id, +expiration" and
> >binding current_time as the parameter since there is no point in
> >retrieving rows that you will not be updating is there?  "
>
> >You are correct that does make sense. I guess I was trying avoid any
> >ambiguities of a WHERE clause on the SELECT because I do not
> >understand its behavior in this circumstance.
>
> If you cannot understand the behaviour with a WHERE clause, then what
> would make you think that one without a WHERE clause would be any more
> transparent, especially given that all Relational Databases are designed to
> provide you the results you asked for as efficiently as possible?  Perhaps
> in a few days you will discover that you need to create another index for
> some other purpose, and that causes SQLite3 to obtain what you said you
> wanted in an entirely different manner.  When you make any change to the
> database do you re-evaluate the implementation details of every previously
> written SQL statement to see if it still compatible with the details you
> depended on?  What about it you update the version of SQLite3?  You should
> not be dependent on the peculiarities of the implementation since they
> might change at any time.
>
> >You mentioned two database connections to the same database. Is this
> >going to work if I am using Threadsafe mode = 0?
>
> Yes.  Threadsafe mode only affects programs having multiple threads making
> calls into the sqlite3 library.  These are independent variables (that is
> you can have X threads and Y connections, and X is independent of Y) just
> because you have 47 connections does not mean that you have more than 1
> thread, nor does having 47 threads mean that you have more than 1
> connection.  Threads are commenced with _beginthread (or equivalent for the
> OS) calls and connections are commenced with sqlite3_open* calls.  The
> _beginthread operations result in the creation of a thread and the
> sqlite3_open* calls create a database connection -- they are not related to
> each other in any way.  Also consider that it is entirely possible for a
> program to have hundreds of threads yet still only be single-threaded as
> far as sqlite3 is concerned if only one of those threads makes use of the
> sqlite3 library, and that one thread may use hundreds of database
> connections either serially or in parallel or in some combination thereof.
>
> >Would the second connection be done through an attach?
>
> No.  The attach statement attaches a database to a connection.  You have
> to have opened the connection first.  Connections are created with the
> sqlite3_open* functions which return a pointer to a database connection.
>
> >Does this conversation change if I wrap the whole select and updates
> >in one transaction? e.g. BEGIN...END
>
> No, because isolation is only BETWEEN connections, not WITHIN
> connections.  And the transaction state is per connection.
>
> --
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Endless loop possible with simultaneous SELECT and UPDATE?

Simon Slavin-3
On 3 Sep 2019, at 5:34pm, Alexander Vega <[hidden email]> wrote:

> sqlite3_open("database1", &db1);
> sqlite3_open("database1", &db2);

Bear in mind that SQLite is not a server/client DBMS.  The database is not kept in memory (unless you arrange this explicitly).  All operations have to wait for the storage that holds the database.

So yes, you can open two connections.  But it's not going to magically double the speed of your program.

> // do SELECT on db1
> // do UPDATE on db2

Do you expect the SELECT to see the results of the previous UPDATE ?  It won't, until the transaction has ended (unless you arrange this explicitly).
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Endless loop possible with simultaneous SELECT and UPDATE?

Kees Nuyt
On Tue, 3 Sep 2019 18:26:01 +0100, you wrote:

>> // do SELECT on db1
>> // do UPDATE on db2
>
> Do you expect the SELECT to see the results of the previous
> UPDATE ?  It won't, until the transaction has ended
> (unless you arrange this explicitly).

That's the nice thing about this construct:
isolation between the SELECT and the UPDATE,
the pattern is indeed:

* Iterate over unchanged rows,
  without seeing DELETEs, UPDATEs and INSERTs
  on the tables in the select,

* Be free to act upon the original rows
  without disturbance of the read.

I think it only works for WAL journal mode.

--
Regards,
Kees Nuyt

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Endless loop possible with simultaneous SELECT and UPDATE?

Keith Medcalf

On Tuesday, 3 September, 2019 15:01, Kees Nuyt <[hidden email]> wrote:

>On Tue, 3 Sep 2019 18:26:01 +0100, you wrote:

>>> // do SELECT on db1
>>> // do UPDATE on db2

>> Do you expect the SELECT to see the results of the previous
>> UPDATE ?  It won't, until the transaction has ended
>> (unless you arrange this explicitly).

>That's the nice thing about this construct:
>isolation between the SELECT and the UPDATE,
>the pattern is indeed:

>* Iterate over unchanged rows,
>  without seeing DELETEs, UPDATEs and INSERTs
>  on the tables in the select,

>* Be free to act upon the original rows
>  without disturbance of the read.

>I think it only works for WAL journal mode.

It will work just fine in non-WAL mode provided that the number of changed pages made by the updates does not exceed the size of the cache (that is, the changes on db2 do not have to spill the cache and thus obtain an exclusive lock, that it will not be able to obtain, prior to commit time).  You of course have to start an ordinary transaction on db1 and an immediate transaction on db2 before starting the query/change loop, and commit db1 before committing db2 after the loop runs out of rows (so that you do not deadlock yourself when the commit on db2 attempts to upgrade to an exclusive lock).

In WAL journal mode you do not have to worry any of that at all since an open read will not prevent a write on a different connection from spilling the cache or committing.

You can do it all on one connection using only a single immediate transaction (whether in WAL journal mode or not) if you are sure the query in the outer loop cannot be using an index modified by the inner update/delete/insert operations.  One way to do this is to require the use of a sorter by that query so that all the results must be gathered before the first row is returned (such as by using the ORDER BY +colname implementation detail).

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users