sqlite3_step and ORDER BY random() question.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

sqlite3_step and ORDER BY random() question.

Reid Thompson
hi, and thanks.

my questions are, if I prepare and utilize a statement for a result set
in the tens of thousands (or more) using a where clause along the lines
of
   "ORDER BY batch_id, due_datetime, random()"

1) As I sqlite3_step through the result set, am I guaranteed to get each
row only once?

2) is the order set only once on the first sqlite3_step, or does it
change with each sqlite3_step invocation?

thanks,
reid

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Simon Slavin-3

On 10 Apr 2017, at 9:28pm, Reid Thompson <[hidden email]> wrote:

> my questions are, if I prepare and utilize a statement for a result set
> in the tens of thousands (or more) using a where clause along the lines
> of
>    "ORDER BY batch_id, due_datetime, random()"
>
> 1) As I sqlite3_step through the result set, am I guaranteed to get each
> row only once?

Yes.  _step() will return each row exactly once.  This assumes you will not make any changes to the table those rows are in until you have finished stepping.  If you make any changes to the table before the last call to _step() then things may get more complicated.

> 2) is the order set only once on the first sqlite3_step, or does it
> change with each sqlite3_step invocation?

Because no index already exists for that ORDER, you can imagine that SQLite makes up a temporary index for the results when you do the first sqlite3_step(), and deletes it when you execute sqlite3_reset() or sqlite3_finalize().

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Reid Thompson
On Mon, 2017-04-10 at 21:39 +0100, Simon Slavin wrote:

> On 10 Apr 2017, at 9:28pm, Reid Thompson <[hidden email]> wrote:

> > my questions are, if I prepare and utilize a statement for a result set
> > in the tens of thousands (or more) using a where clause along the lines
> > of
> >    "ORDER BY batch_id, due_datetime, random()"
> > 
> > 1) As I sqlite3_step through the result set, am I guaranteed to get each
> > row only once?

> Yes.  _step() will return each row exactly once.  This assumes you will not make any changes to the table those rows are in until you have finished stepping.  If you make any changes to the table
> before the last call to _step() then things may get more complicated.

Thank you Simon.

so things may get more complicated:

Does 'more complicated' in the below scenario mean that there is
potential for records to be returned more than once if I use random()?

I have a manager process that manages the above step'ing. It gathers
records in batches of 30, marks them as claimed, and forwards those
batches to one of 50 worker processes. Each worker process performs work
based on each records data and when complete provides that information
back to the manager along with the record data.  The manager process
uses the returned information to update the records.  When a worker
completes a batche, the manager process sends them another batch. So
in most cases, I'm performing two updates to the record while step'ing
through the result set.


> > 2) is the order set only once on the first sqlite3_step, or does it
> > change with each sqlite3_step invocation?

> Because no index already exists for that ORDER, you can imagine that SQLite makes up a temporary index for the results when you do the first sqlite3_step(), and deletes it when you execute
> sqlite3_reset() or sqlite3_finalize().

> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Richard Hipp-3
On 4/11/17, Reid Thompson <[hidden email]> wrote:

> On Mon, 2017-04-10 at 21:39 +0100, Simon Slavin wrote:
>> On 10 Apr 2017, at 9:28pm, Reid Thompson <[hidden email]> wrote:
>>
>> > my questions are, if I prepare and utilize a statement for a result set
>> > in the tens of thousands (or more) using a where clause along the lines
>> > of
>> >    "ORDER BY batch_id, due_datetime, random()"
>> >
>> > 1) As I sqlite3_step through the result set, am I guaranteed to get
>> > each
>> > row only once?
>>
>> Yes.  _step() will return each row exactly once.  This assumes you will
>> not make any changes to the table those rows are in until you have
>> finished stepping.  If you make any changes to the table
>> before the last call to _step() then things may get more complicated.
>
>
> I have a manager process that manages the above step'ing. It gathers
> records in batches of 30, marks them as claimed, and forwards those
> batches to one of 50 worker processes. Each worker process performs work
> based on each records data and when complete provides that information
> back to the manager along with the record data.  The manager process
> uses the returned information to update the records.  When a worker
> completes a batche, the manager process sends them another batch. So
> in most cases, I'm performing two updates to the record while step'ing
> through the result set.

If you make changes to the table being queried in the middle of the
query, then there are no guarantees.

In your case, you are *probably* ok (for now) because the ORDER BY is
probably not being accomplished using an index.  But if batch_id and
due_datetime are both NOT NULL and unique, then the third ORDER BY
term will be ignored, and the UNIQUE index will be used to order the
output.  And in that case, if any of your worker threads modify the
batch_id or due_datetime fields, then there could be issues.

A safe work-around is to run the query into a TEMP table:

   CREATE TEMP TABLE work_to_do AS SELECT .....

Then query the work_to_do table for stuff to be modified.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Simon Slavin-3
In reply to this post by Reid Thompson

On 11 Apr 2017, at 12:33pm, Reid Thompson <[hidden email]> wrote:

> Does 'more complicated' in the below scenario mean that there is
> potential for records to be returned more than once if I use random()?

The problem is not to do with random().  The problem is to do with modifying a field used in your SELECT command.  Suppose you have an index on the column "name" and you execute

SELECT name FROM treetypes ORDER BY name

with the following names

apple
lemon
oak
orange
teak
willow

You execute _step() twice, returning the top two, then issue

UPDATE treetypes SET name="citrus" WHERE name="orange"

What do you expect SQLite to do for the SELECT ?  Should it return a list of names which is clearly not in the order you asked for ?  Should it miss out one of the rows even though you asked for them all ?  Should it return "orange" even though you have already changed that value ?

That’s the "more complicated".  And for the sanity of people debugging your code it’s better to ensure it never happens.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Reid Thompson
In reply to this post by Richard Hipp-3
On Tue, 2017-04-11 at 07:39 -0400, Richard Hipp wrote:

> On 4/11/17, Reid Thompson <[hidden email]> wrote:
> > On Mon, 2017-04-10 at 21:39 +0100, Simon Slavin wrote:
> > > On 10 Apr 2017, at 9:28pm, Reid Thompson <[hidden email]> wrote:
> > > 
> > > > my questions are, if I prepare and utilize a statement for a result set
> > > > in the tens of thousands (or more) using a where clause along the lines
> > > > of
> > > >    "ORDER BY batch_id, due_datetime, random()"
> > > > 
> > > > 1) As I sqlite3_step through the result set, am I guaranteed to get
> > > > each
> > > > row only once?
> > > 
> > > Yes.  _step() will return each row exactly once.  This assumes you will
> > > not make any changes to the table those rows are in until you have
> > > finished stepping.  If you make any changes to the table
> > > before the last call to _step() then things may get more complicated.
> > 
> > 
> > I have a manager process that manages the above step'ing. It gathers
> > records in batches of 30, marks them as claimed, and forwards those
> > batches to one of 50 worker processes. Each worker process performs work
> > based on each records data and when complete provides that information
> > back to the manager along with the record data.  The manager process
> > uses the returned information to update the records.  When a worker
> > completes a batche, the manager process sends them another batch. So
> > in most cases, I'm performing two updates to the record while step'ing
> > through the result set.

> If you make changes to the table being queried in the middle of the
> query, then there are no guarantees.

> In your case, you are *probably* ok (for now) because the ORDER BY is
> probably not being accomplished using an index.  But if batch_id and
> due_datetime are both NOT NULL and unique, then the third ORDER BY
> term will be ignored, and the UNIQUE index will be used to order the
> output.  And in that case, if any of your worker threads modify the
> batch_id or due_datetime fields, then there could be issues.

> A safe work-around is to run the query into a TEMP table:

>    CREATE TEMP TABLE work_to_do AS SELECT .....

> Then query the work_to_do table for stuff to be modified.

Thank you.  I'll make adjustments to my process.  One follow on
question.  This would be a concern regardless of whether random() is
used or not in the ORDER BY clause? 

reid
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Reid Thompson

> Thank you.  I'll make adjustments to my process.  One follow on
> question.  This would be a concern regardless of whether random() is
> used or not in the ORDER BY clause? 

> reid

sorry - I posted this before receiving/seeing Simon's response.

reid
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Reid Thompson
In reply to this post by Simon Slavin-3
On Tue, 2017-04-11 at 13:17 +0100, Simon Slavin wrote:

> On 11 Apr 2017, at 12:33pm, Reid Thompson <[hidden email]> wrote:

> > Does 'more complicated' in the below scenario mean that there is
> > potential for records to be returned more than once if I use random()?

> The problem is not to do with random().  The problem is to do with modifying a field used in your SELECT command.  Suppose you have an index on the column "name" and you execute

> SELECT name FROM treetypes ORDER BY name

> with the following names

> apple
> lemon
> oak
> orange
> teak
> willow

> You execute _step() twice, returning the top two, then issue

> UPDATE treetypes SET name="citrus" WHERE name="orange"

> What do you expect SQLite to do for the SELECT ?  Should it return a list of names which is clearly not in the order you asked for ?  Should it miss out one of the rows even though you asked for
> them all ?  Should it return "orange" even though you have already changed that value ?

> That’s the "more complicated".  And for the sanity of people debugging your code it’s better to ensure it never happens.

> Simon.

Thanks again for the explanation. 

> That’s the "more complicated".  And for the sanity of people debugging your code it’s better to ensure it never happens.

;) yes. I'm debugging some inherited code.

I/and the original implementer, am/were more familiar with PostgreSQL's
MVCC.  So I think the issue was the assumption that the query being
stepped through would only ever see the rows as they were at the start
of the query and would walk through them from first to last.

Would
PRAGMA journal_mode=WAL;
provide that?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Simon Slavin-3

On 11 Apr 2017, at 2:16pm, Reid Thompson <[hidden email]> wrote:

> I/and the original implementer, am/were more familiar with PostgreSQL's
> MVCC.  So I think the issue was the assumption that the query being
> stepped through would only ever see the rows as they were at the start
> of the query and would walk through them from first to last.
>
> Would
> PRAGMA journal_mode=WAL;
> provide that?

Your situation would be that you have a single connection to the database, and it’s being used by two statements at the same time ?

You’re beyond what I can promise.  I need someone else to answer that.

The answer might be different if the SELECT was using a different connection to the one which was modifying the table.  Again, I hope someone else can help.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Richard Hipp-3
In reply to this post by Reid Thompson
On 4/11/17, Reid Thompson <[hidden email]> wrote:
>
> I/and the original implementer, am/were more familiar with PostgreSQL's
> MVCC.  So I think the issue was the assumption that the query being
> stepped through would only ever see the rows as they were at the start
> of the query and would walk through them from first to last.
>

That is true, as long as the query and the workers are all using
separate database connections.  Trouble only arises when the query is
run partially - not to completion - and then the table being queried
is updated *on the same database connection*.  Changes implemented on
separate database connections are isolated and will not change the
result of the original query.

The problem scenario is impossible on PostgreSQL or any other
client-server database because the client/server databases run the
whole query all the way to the end before returning the results to
you.  It has nothing to do with MVCC.

You can emulate the PostgreSQL behavior by:

(1) Saving the query results in a TEMP table and then querying the
TEMP table separately and running the workers based on the TEMP table.

(2) Use the sqlite3_get_table() (or a similar wrapper of your own
concoction) to load all query results into memory prior to starting
the UPDATEs.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sqlite3_step and ORDER BY random() question.

Keith Medcalf
In reply to this post by Reid Thompson

> I/and the original implementer, am/were more familiar with PostgreSQL's
> MVCC.  So I think the issue was the assumption that the query being
> stepped through would only ever see the rows as they were at the start
> of the query and would walk through them from first to last.
>
> Would
> PRAGMA journal_mode=WAL;
> provide that?

Yes. WAL journal mode provides "Repeatable Read" isolation within a transaction.  This means that if you issue:

BEGIN;
SELECT .....
wait 3 hours with lots of intervening updates to all the tables in the database including the ones in your select above, done on a different connection.
SELECT .....
COMMIT;
SELECT .....

Assuming that all the selects are the same, then the first two selects will return identical results.  All the updates will not be seen until after the read transaction is committed, so the third select will see the updates done in the updates that occurred on different connections.

If you do not explicitly BEGIN and COMMIT transactions, then SQLite3 effectively wraps each statement in its own transaction.  Note that without WAL, the above example will prohibit updates on other connections during the 3 hour perios you are holding the transaction open since without WAL, readers block writers and writers block readers.

Even in WAL mode changes to the database made ON THE SAME CONNECTION are immediately visible to every statement on that connection.


--
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Reid Thompson
> Sent: Tuesday, 11 April, 2017 07:17
> To: [hidden email]
> Subject: Re: [sqlite] sqlite3_step and ORDER BY random() question.
>
> On Tue, 2017-04-11 at 13:17 +0100, Simon Slavin wrote:
> > On 11 Apr 2017, at 12:33pm, Reid Thompson <[hidden email]>
> wrote:
> >
> > > Does 'more complicated' in the below scenario mean that there is
> > > potential for records to be returned more than once if I use random()?
> >
> > The problem is not to do with random().  The problem is to do with
> modifying a field used in your SELECT command.  Suppose you have an index
> on the column "name" and you execute
> >
> > SELECT name FROM treetypes ORDER BY name
> >
> > with the following names
> >
> > apple
> > lemon
> > oak
> > orange
> > teak
> > willow
> >
> > You execute _step() twice, returning the top two, then issue
> >
> > UPDATE treetypes SET name="citrus" WHERE name="orange"
> >
> > What do you expect SQLite to do for the SELECT ?  Should it return a
> list of names which is clearly not in the order you asked for ?  Should it
> miss out one of the rows even though you asked for
> > them all ?  Should it return "orange" even though you have already
> changed that value ?
> >
> > That’s the "more complicated".  And for the sanity of people debugging
> your code it’s better to ensure it never happens.
> >
> > Simon.
>
> Thanks again for the explanation.
>
> > That’s the "more complicated".  And for the sanity of people debugging
> your code it’s better to ensure it never happens.
>
> ;) yes. I'm debugging some inherited code.
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...