random value get re-generated too often in SQLite

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

random value get re-generated too often in SQLite

John Mount
I am seeing an issue where a random value in a sub-query gets re-generated each time the value is used in later queries.  Below is an example query:

SELECT r AS r1, r AS r2 FROM ( SELECT random() AS r FROM ( SELECT * from ( VALUES(1),(2) ) ) a ) b

One would expect r1 == r2.

---------------
John Mount
http://www.win-vector.com/ <http://www.win-vector.com/>
Our book: Practical Data Science with R http://www.manning.com/zumel/ <http://www.manning.com/zumel/>



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: random value get re-generated too often in SQLite

John McKown
On Fri, Dec 8, 2017 at 12:54 PM, John Mount <[hidden email]> wrote:

> I am seeing an issue where a random value in a sub-query gets re-generated
> each time the value is used in later queries.  Below is an example query:
>
> SELECT r AS r1, r AS r2 FROM ( SELECT random() AS r FROM ( SELECT * from (
> VALUES(1),(2) ) ) a ) b
>
> One would expect r1 == r2.
>
> ---------------
> John Mount
>

​Interesting. I duplicated your example using PostgreSQL and it does what
you said you expected: R1==R2. In SQLite3, the .eqp full shows:

sqlite> .eqp full
sqlite> SELECT r AS r1, r AS r2 FROM ( SELECT random() AS r FROM ( SELECT *
from ( VALUES(1),(2) ) ) a ) b;
--EQP-- 0,0,0,SCAN SUBQUERY 1
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     0     0                    00  Start at 0
1     InitCoroutine  1     7     2                    00
sqlite_sq_5619D81F9BF8
2     Integer        1     2     0                    00  r[2]=1
3     Yield          1     0     0                    00
4     Integer        2     2     0                    00  r[2]=2
5     Yield          1     0     0                    00
6     EndCoroutine   1     0     0                    00
7     InitCoroutine  1     0     2                    00
8       Yield          1     13    0                    00  next row of
"sqlite_sq_5619D81F9BF8"
9       Function0      0     0     4     random(0)      00  r[4]=func(r[0])
10      Function0      0     0     5     random(0)      00  r[5]=func(r[0])
11      ResultRow      4     2     0                    00  output=r[4..5]
12    Goto           0     8     0                    00
13    Halt           0     0     0                    00
4548137244590923354  8821858240296964415
761559492082550893  2723588653195689097

I think this is being done due to the "subquery flattening" as described on
http://sqlite.org/optoverview.html, which says:

To overcome this problem, SQLite attempts to flatten subqueries in the FROM
clause of a SELECT. This involves inserting the FROM clause of the subquery
into the FROM clause of the outer query and rewriting expressions in the
outer query that refer to the result set of the subquery. For example:

So your "SELECT r AS r1, r AS r2 FROM ( SELECT random() AS r FROM ( SELECT
* from ( VALUES(1),(2) ) ) a ) b;" ​be​comes

"SELECT random() AS r1, random() AS r2 FROM ( SELECT * from (values(1),(2))
a) b;"

I think the above is born out if you put the word DISTINCT in front of the
random() in the original example. This forces SQLite to _not_ use subquery
flattening. Example:

sqlite> SELECT  r AS r1, r AS r2 FROM ( SELECT distinct random() AS r FROM
( SELECT * from ( VALUES(1),(2) ) c ) a ) b;
--EQP-- 1,0,0,SCAN SUBQUERY 2 AS c
--EQP-- 1,0,0,USE TEMP B-TREE FOR DISTINCT
--EQP-- 0,0,0,SCAN SUBQUERY 1 AS b
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     0     0                    00  Start at 0
1     InitCoroutine  1     18    2                    00
sqlite_sq_55E270A58EA8
2     InitCoroutine  2     8     3                    00
sqlite_sq_55E270A58AA8
3     Integer        1     3     0                    00  r[3]=1
4     Yield          2     0     0                    00
5     Integer        2     3     0                    00  r[3]=2
6     Yield          2     0     0                    00
7     EndCoroutine   2     0     0                    00
8     OpenEphemeral  3     0     0     k(1,B)         08  nColumn=0
9     InitCoroutine  2     0     3                    00
10      Yield          2     17    0                    00  next row of
"sqlite_sq_55E270A58AA8"
11      Function0      0     0     5     random(0)      00  r[5]=func(r[0])
12      Found          3     16    5     1              00  key=r[5]
13      MakeRecord     5     1     6                    00  r[6]=mkrec(r[5])
14      IdxInsert      3     6     0                    00  key=r[6]
15      Yield          1     0     0                    00
16    Goto           0     10    0                    00
17    EndCoroutine   1     0     0                    00
18    InitCoroutine  1     0     2                    00
19      Yield          1     24    0                    00  next row of
"sqlite_sq_55E270A58EA8"
20      Copy           5     7     0                    00  r[7]=r[5];
sqlite_sq_55E270A58EA8.r
21      Copy           7     8     0                    00  r[8]=r[7]
22      ResultRow      7     2     0                    00  output=r[7..8]
23    Goto           0     19    0                    00
24    Halt           0     0     0                    00
920225462863128947|920225462863128947
-723158119245037038|-723158119245037038

​Hope this at least explains what is happening. I don't know if this is
"correct" or not.​


--
I have a theory that it's impossible to prove anything, but I can't prove
it.

Maranatha! <><
John McKown
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: random value get re-generated too often in SQLite

Scott Doctor
In reply to this post by John Mount

Is it possible that the first call to random is cached and the
cached value is being returned in subsequent calls?

-------------------------
Scott Doctor
[hidden email]
-------------------------

On 12/8/2017 12:09, John McKown wrote:

> On Fri, Dec 8, 2017 at 12:54 PM, John Mount <[hidden email]> wrote:
>
>> I am seeing an issue where a random value in a sub-query gets re-generated
>> each time the value is used in later queries.  Below is an example query:
>>
>> SELECT r AS r1, r AS r2 FROM ( SELECT random() AS r FROM ( SELECT * from (
>> VALUES(1),(2) ) ) a ) b
>>
>> One would expect r1 == r2.
>>
>> ---------------
>> John Mount
>>
> ​Interesting. I duplicated your example using PostgreSQL and it does what
> you said you expected: R1==R2. In SQLite3, the .eqp full shows:
>
> sqlite> .eqp full
> sqlite> SELECT r AS r1, r AS r2 FROM ( SELECT random() AS r FROM ( SELECT *
> from ( VALUES(1),(2) ) ) a ) b;
> --EQP-- 0,0,0,SCAN SUBQUERY 1
> addr  opcode         p1    p2    p3    p4             p5  comment
> ----  -------------  ----  ----  ----  -------------  --  -------------
> 0     Init           0     0     0                    00  Start at 0
> 1     InitCoroutine  1     7     2                    00
> sqlite_sq_5619D81F9BF8
> 2     Integer        1     2     0                    00  r[2]=1
> 3     Yield          1     0     0                    00
> 4     Integer        2     2     0                    00  r[2]=2
> 5     Yield          1     0     0                    00
> 6     EndCoroutine   1     0     0                    00
> 7     InitCoroutine  1     0     2                    00
> 8       Yield          1     13    0                    00  next row of
> "sqlite_sq_5619D81F9BF8"
> 9       Function0      0     0     4     random(0)      00  r[4]=func(r[0])
> 10      Function0      0     0     5     random(0)      00  r[5]=func(r[0])
> 11      ResultRow      4     2     0                    00  output=r[4..5]
> 12    Goto           0     8     0                    00
> 13    Halt           0     0     0                    00
> 4548137244590923354  8821858240296964415
> 761559492082550893  2723588653195689097
> ​
> I think this is being done due to the "subquery flattening" as described on
> http://sqlite.org/optoverview.html, which says:
>
> To overcome this problem, SQLite attempts to flatten subqueries in the FROM
> clause of a SELECT. This involves inserting the FROM clause of the subquery
> into the FROM clause of the outer query and rewriting expressions in the
> outer query that refer to the result set of the subquery. For example:
>
> So your "SELECT r AS r1, r AS r2 FROM ( SELECT random() AS r FROM ( SELECT
> * from ( VALUES(1),(2) ) ) a ) b;" ​be​comes
>
> "SELECT random() AS r1, random() AS r2 FROM ( SELECT * from (values(1),(2))
> a) b;"
>
> I think the above is born out if you put the word DISTINCT in front of the
> random() in the original example. This forces SQLite to _not_ use subquery
> flattening. Example:
>
> sqlite> SELECT  r AS r1, r AS r2 FROM ( SELECT distinct random() AS r FROM
> ( SELECT * from ( VALUES(1),(2) ) c ) a ) b;
> --EQP-- 1,0,0,SCAN SUBQUERY 2 AS c
> --EQP-- 1,0,0,USE TEMP B-TREE FOR DISTINCT
> --EQP-- 0,0,0,SCAN SUBQUERY 1 AS b
> addr  opcode         p1    p2    p3    p4             p5  comment
> ----  -------------  ----  ----  ----  -------------  --  -------------
> 0     Init           0     0     0                    00  Start at 0
> 1     InitCoroutine  1     18    2                    00
> sqlite_sq_55E270A58EA8
> 2     InitCoroutine  2     8     3                    00
> sqlite_sq_55E270A58AA8
> 3     Integer        1     3     0                    00  r[3]=1
> 4     Yield          2     0     0                    00
> 5     Integer        2     3     0                    00  r[3]=2
> 6     Yield          2     0     0                    00
> 7     EndCoroutine   2     0     0                    00
> 8     OpenEphemeral  3     0     0     k(1,B)         08  nColumn=0
> 9     InitCoroutine  2     0     3                    00
> 10      Yield          2     17    0                    00  next row of
> "sqlite_sq_55E270A58AA8"
> 11      Function0      0     0     5     random(0)      00  r[5]=func(r[0])
> 12      Found          3     16    5     1              00  key=r[5]
> 13      MakeRecord     5     1     6                    00  r[6]=mkrec(r[5])
> 14      IdxInsert      3     6     0                    00  key=r[6]
> 15      Yield          1     0     0                    00
> 16    Goto           0     10    0                    00
> 17    EndCoroutine   1     0     0                    00
> 18    InitCoroutine  1     0     2                    00
> 19      Yield          1     24    0                    00  next row of
> "sqlite_sq_55E270A58EA8"
> 20      Copy           5     7     0                    00  r[7]=r[5];
> sqlite_sq_55E270A58EA8.r
> 21      Copy           7     8     0                    00  r[8]=r[7]
> 22      ResultRow      7     2     0                    00  output=r[7..8]
> 23    Goto           0     19    0                    00
> 24    Halt           0     0     0                    00
> 920225462863128947|920225462863128947
> -723158119245037038|-723158119245037038
>
> ​Hope this at least explains what is happening. I don't know if this is
> "correct" or not.​
>
>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: random value get re-generated too often in SQLite

Luuk
On 08-12-17 23:34, Scott Doctor wrote:
>
> Is it possible that the first call to random is cached and the cached
> value is being returned in subsequent calls?
>
> -------------------------
> Scott Doctor
> [hidden email]
> -------------------------
>

The easiest way to get this behaviour is to store the random value in a
temp table, than you can use it as long as you like.

sqlite> create temp table rnd (i integer);
sqlite> select * from rnd;
sqlite> insert into rnd values (random());
sqlite> select * from rnd;
7337803792641969525
sqlite>

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users