generate_series can theoretically behave differently in SQLite 3.23.0

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

generate_series can theoretically behave differently in SQLite 3.23.0

E.Pasma
Hello, below is a case where the generate_series virtual table behaves  
differently with SQLite 3.23. It is a purely artificial case. Does  
anyoone have an opinion about it?

The issue comes from the fact that generate_series does not check if a  
supposed integer input value happens to be NULL. And continues as if  
zero is passed in. This way a WHERE condition like

<VT column>=<expression that gives NULL>

can be true. Regular tables require "IS" instead of "=" here.  
Theoretically the left join reduction prover can now be misleaded. As  
is the case below,

Is this a (obscure) bug in generate_series?

E Pasma

.version
SQLite 3.23.0 2018-03-24 13:24:02  
cf171abe954a5f25262161dd69f2e8cecdbf9446c3f6b298201507dbc743567e
zlib version 1.2.3
gcc-4.0.1 (Apple Inc. build 5465)
.load series
create table t1(x);
create table t2(x,y);
insert into t1 values(1),(2);
insert into t2 values(1,4);
select t1.x, value
from t1
left join t2 using (x)
join generate_series on start=t2.y and stop=5;
1|4
1|5

Same script in earlier version with echo off.

.version
SQLite 3.22.0 2018-01-09 15:28:25  
6b3f4349d7156501f05270d494d6002ee03a0860b6e77b46d17effcab027b753
.echo off
1|4
1|5
2|0
2|1
2|2
2|3
2|4
2|5

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generate_series can theoretically behave differently in SQLite 3.23.0

Richard Hipp-3
On 4/3/18, E.Pasma <[hidden email]> wrote:
> Hello, below is a case where the generate_series virtual table behaves
> differently with SQLite 3.23. It is a purely artificial case. Does
> anyoone have an opinion about it?

Having pondered this overnight, I decided that it should probably be
fixed.  There is now a ticket:

     https://www.sqlite.org/src/info/fac496b61722daf2

Probably there will be a 3.23.1 patch release later today.

>
> The issue comes from the fact that generate_series does not check if a
> supposed integer input value happens to be NULL. And continues as if
> zero is passed in. This way a WHERE condition like
>
> <VT column>=<expression that gives NULL>
>
> can be true. Regular tables require "IS" instead of "=" here.
> Theoretically the left join reduction prover can now be misleaded. As
> is the case below,
>
> Is this a (obscure) bug in generate_series?
>
> E Pasma
>
> .version
> SQLite 3.23.0 2018-03-24 13:24:02
> cf171abe954a5f25262161dd69f2e8cecdbf9446c3f6b298201507dbc743567e
> zlib version 1.2.3
> gcc-4.0.1 (Apple Inc. build 5465)
> .load series
> create table t1(x);
> create table t2(x,y);
> insert into t1 values(1),(2);
> insert into t2 values(1,4);
> select t1.x, value
> from t1
> left join t2 using (x)
> join generate_series on start=t2.y and stop=5;
> 1|4
> 1|5
>
> Same script in earlier version with echo off.
>
> .version
> SQLite 3.22.0 2018-01-09 15:28:25
> 6b3f4349d7156501f05270d494d6002ee03a0860b6e77b46d17effcab027b753
> .echo off
> 1|4
> 1|5
> 2|0
> 2|1
> 2|2
> 2|3
> 2|4
> 2|5
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generate_series can theoretically behave differently in SQLite 3.23.0

petern
In reply to this post by E.Pasma
I think your left join reduction regression change happens on any vtable
hidden column filter reference to an outer scope column.  A CTE duplicates
your finding below.

SQLite 3.23.0 2018-04-02 11:04:16 736b53f57f70b23172c30880186dce
7ad9baa3b74e3838cae5847cffb98f5cd2
sqlite> WITH t1(x) AS (VALUES(1),(2)), t2(x,y) AS (VALUES(1,4)) SELECT
x,value FROM t1 LEFT JOIN t2 USING(x) JOIN generate_series ON start=t2.y
AND stop=5;
1,4
1,5

SQLite 3.22.0 2018-01-22 18:45:57 0c55d179733b46d8d0ba4d88e01a25
e10677046ee3da1d5b1581e86726f2alt1
sqlite> WITH t1(x) AS (VALUES(1),(2)), t2(x,y) AS (VALUES(1,4)) SELECT
x,value FROM t1 LEFT JOIN t2 USING(x) JOIN generate_series ON start=t2.y
AND stop=5;
x,value
1,4
1,5
2,0
2,1
2,2
2,3
2,4
2,5

Peter


On Tue, Apr 3, 2018 at 12:22 AM, E.Pasma <[hidden email]> wrote:

> Hello, below is a case where the generate_series virtual table behaves
> differently with SQLite 3.23. It is a purely artificial case. Does anyoone
> have an opinion about it?
>
> The issue comes from the fact that generate_series does not check if a
> supposed integer input value happens to be NULL. And continues as if zero
> is passed in. This way a WHERE condition like
>
> <VT column>=<expression that gives NULL>
>
> can be true. Regular tables require "IS" instead of "=" here.
> Theoretically the left join reduction prover can now be misleaded. As is
> the case below,
>
> Is this a (obscure) bug in generate_series?
>
> E Pasma
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generate_series can theoretically behave differently in SQLite 3.23.0

Richard Hipp-3
In reply to this post by Richard Hipp-3
On 4/3/18, Richard Hipp <[hidden email]> wrote:
>
> Probably there will be a 3.23.1 patch release later today.
>

Or, maybe not.

If the series.c file is compiled with -DSQLITE_SERIES_CONSTRAINT_VERIFY=1 then
the generate_series() virtual table behaves correctly, and Edzard's
example gives a correct answer both before and after the new LEFT JOIN
strength reduction optimization is added.
Without the -DSQLITE_SERIES_CONSTRAINT_VERIFY=1 compile-time option,
the test case consistently gives the wrong answer.  Here is an
alternative test case:

  WITH
    t1(x) AS (VALUES(1),(2)),
    t2(y,z) AS (VALUES(2,1))
  SELECT * FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series
   WHERE start=z AND stop=2;

The query above should return only two rows:

   2 2 1 1
   2 2 1 2

But it instead returns 5 rows, because the generate_series virtual
table is telling the code generate that it does not need to check the
start=z constraint.  When the start=z constraint is not checked, then
indeed 5 rows are generated because the query becomes equivalent to
this:

  WITH
    t1(x) AS (VALUES(1),(2)),
    t2(y,z) AS (VALUES(2,1))
  SELECT * FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series
   WHERE start=coalesce(z,0) AND stop=2;

I'm testing a patch now that causes the LEFT JOIN strength reduction
optimization to assume that NULL arguments to a virtual table
constraint can return a TRUE result.  But I'm wondering, since this is
really a work-around to problems in virtual table implementations, if
this change warrants a patch release?

Your thoughts?

Should we issue 3.23.1 just to work around dodgy virtual table
implementations?  Or should we just check-in the change and let those
who want to continue using their dodgy virtual tables either patch the
issue themselves or wait for 3.24.0?

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generate_series can theoretically behave differently in SQLite 3.23.0

E.Pasma
Richard Hipp wrote:
...

>
> I'm testing a patch now that causes the LEFT JOIN strength reduction
> optimization to assume that NULL arguments to a virtual table
> constraint can return a TRUE result.  But I'm wondering, since this is
> really a work-around to problems in virtual table implementations, if
> this change warrants a patch release?
>
> Your thoughts?
>
> Should we issue 3.23.1 just to work around dodgy virtual table
> implementations?  Or should we just check-in the change and let those
> who want to continue using their dodgy virtual tables either patch the
> issue themselves or wait for 3.24.0?
>
3.24




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generate_series can theoretically behave differently in SQLite 3.23.0

petern
In reply to this post by Richard Hipp-3
Two points.

1. If any NULL constraint on any vtable is to return no rows, then why does
the vtable code get called in the first place?

2. The shipped extension series.c had some NULL default constraint
behavior.  Are such constraint defaults now bad form?

In other words, the LEFT strength reduction compatibility refinement you
just made fixed the problem.

Since it now works, why is series.c also being changed here?
https://www.sqlite.org/src/info/3328e828e28ca719

[Also, why the zeal on series.c but no effort toward an actual segmentation
fault I reported in another shipped extension, eval.c?]

Peter



On Tue, Apr 3, 2018 at 6:50 AM, Richard Hipp <[hidden email]> wrote:

> On 4/3/18, Richard Hipp <[hidden email]> wrote:
> >
> > Probably there will be a 3.23.1 patch release later today.
> >
>
> Or, maybe not.
>
> If the series.c file is compiled with -DSQLITE_SERIES_CONSTRAINT_VERIFY=1
> then
> the generate_series() virtual table behaves correctly, and Edzard's
> example gives a correct answer both before and after the new LEFT JOIN
> strength reduction optimization is added.
> Without the -DSQLITE_SERIES_CONSTRAINT_VERIFY=1 compile-time option,
> the test case consistently gives the wrong answer.  Here is an
> alternative test case:
>
>   WITH
>     t1(x) AS (VALUES(1),(2)),
>     t2(y,z) AS (VALUES(2,1))
>   SELECT * FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series
>    WHERE start=z AND stop=2;
>
> The query above should return only two rows:
>
>    2 2 1 1
>    2 2 1 2
>
> But it instead returns 5 rows, because the generate_series virtual
> table is telling the code generate that it does not need to check the
> start=z constraint.  When the start=z constraint is not checked, then
> indeed 5 rows are generated because the query becomes equivalent to
> this:
>
>   WITH
>     t1(x) AS (VALUES(1),(2)),
>     t2(y,z) AS (VALUES(2,1))
>   SELECT * FROM t1 LEFT JOIN t2 ON x=y JOIN generate_series
>    WHERE start=coalesce(z,0) AND stop=2;
>
> I'm testing a patch now that causes the LEFT JOIN strength reduction
> optimization to assume that NULL arguments to a virtual table
> constraint can return a TRUE result.  But I'm wondering, since this is
> really a work-around to problems in virtual table implementations, if
> this change warrants a patch release?
>
> Your thoughts?
>
> Should we issue 3.23.1 just to work around dodgy virtual table
> implementations?  Or should we just check-in the change and let those
> who want to continue using their dodgy virtual tables either patch the
> issue themselves or wait for 3.24.0?
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: generate_series can theoretically behave differently in SQLite 3.23.0

E.Pasma
In reply to this post by E.Pasma
Petern wrote:
> I think your left join reduction regression change happens on any  
> vtable
> hidden column filter reference to an outer scope column.  A CTE  
> duplicates
> your finding below...
Hello Peter, from your message I realize that generate_series is no  
longer essentiall since SQLite offers CTE's. The query below uses no  
extension (and is not relevant for the issue). Hope I did not miss  
anything. E. Pasma

WITH
     t1(x) AS (VALUES(1),(2)),
     t2(x,y) AS (VALUES(1,4)),
     rec_serie AS (
         SELECT x, y AS value
         FROM t1 JOIN t2 USING (x)
         UNION ALL
         SELECT x, value+1
         FROM    rec_serie
         WHERE   value<5
         ORDER BY x
     )
SELECT * FROM rec_serie
;











_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users