Worked in 3.8.9, but now "recursive aggregate queries not supported"

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Worked in 3.8.9, but now "recursive aggregate queries not supported"

Aurel Wisse
On 9/11/15, Richard Hipp <drh at sqlite.org> wrote:

>On 9/11/15, Aurel Wisse <aurel.wisse at gmail.com> wrote:

>> I used a recursive aggregate query in 3.8.9 and it worked very well. Just

>> upgraded to 3.11.1 and the query is broken.

>>

>> This seems to be directly related to

>>

>> Check-in [6d2999af]: Do not allow recursive CTEs that use aggregate
queries

>> in the recursive part.

>>

>> It worked, and now it is disabled. Why ?

>>

>

>If it worked for you before, that was luck.  Aggregate queries in a

>recursive CTE have never worked for the general case.  In fact, I

>didn't realize that there were any cases where it did actually work.

>Can you share with use an example of an aggregate query in a recursive

>CTE that gives a reasonable answer?


Here is the example. It doesn't use an actual aggregation (sum, count), but
the

"min" aggregate function.


From a list of dates, create a lookup table with two

columns: the original date (calc_date) and the date after calc_date (tmw).


CREATE TABLE offsets AS

WITH RECURSIVE dtoff(calc_date, tmw) AS (

WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)

SELECT NULL, min(cd) FROM dates

UNION

SELECT dtoff.tmw, min(dates.cd) FROM dtoff, dates

WHERE dates.cd > dtoff.tmw AND dtoff.tmw NOTNULL

AND strftime('%%w', dates.cd)<>'0')

SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;


--
Aurel Wisse
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

E.Pasma

Op 12 sep 2015, om 13:48 heeft Aurel Wisse het volgende geschreven:

>
>
> Here is the example. It doesn't use an actual aggregation (sum,  
> count), but
> the
>
> "min" aggregate function.
>
>
> From a list of dates, create a lookup table with two
>
> columns: the original date (calc_date) and the date after calc_date  
> (tmw).
>
>
> CREATE TABLE offsets AS
>
> WITH RECURSIVE dtoff(calc_date, tmw) AS (
>
> WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)
>
> SELECT NULL, min(cd) FROM dates
>
> UNION
>
> SELECT dtoff.tmw, min(dates.cd) FROM dtoff, dates
>
> WHERE dates.cd > dtoff.tmw AND dtoff.tmw NOTNULL
>
> AND strftime('%%w', dates.cd)<>'0')
>
> SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>


The query below should work.
I hope that that is not a matter of good luck?

CREATE TABLE securities(calc_date)
;
WITH RECURSIVE dtoff(calc_date, tmw) AS (

WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)

SELECT NULL, min(cd) FROM dates

UNION

SELECT dtoff.tmw,
         -- begin change 150912
         (
     SELECT min(dates.cd)
     FROM dates
     WHERE dates.cd > dtoff.tmw
     AND strftime('%%w', dates.cd)<>'0'
         )
         -- end change 150912
FROM dtoff
WHERE dtoff.tmw NOTNULL
     )
SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;

~

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Aurel Wisse
Yes it works, but it takes forever. By comparison:

Original recursive query: 7-8 seconds (SQLite 3.8.10.1)
Non recursive query (see below): 18-19 seconds. (SQLite 3.8.10.1)
Proposed query: Not finished after 10 minutes. I am cancelling.

Non recursive query with same result:

CREATE TEMP TABLE dateoff AS
WITH dates AS (SELECT DISTINCT calc_date FROM wts_next_day
WHERE strftime('%w', calc_date)<>'0')
SELECT dt1.calc_date AS calc_date, min(dt2.calc_date) AS tmw
FROM dates AS dt1, dates AS dt2
ON dt2.calc_date > dt1.calc_date
GROUP BY dt1.calc_date;

So the recursive option is still way ahead.


On Sat, Sep 12, 2015 at 10:01 AM, E.Pasma <[hidden email]> wrote:

>
> Op 12 sep 2015, om 13:48 heeft Aurel Wisse het volgende geschreven:
>
>>
>>
>> Here is the example. It doesn't use an actual aggregation (sum, count),
>> but
>> the
>>
>> "min" aggregate function.
>>
>>
>> From a list of dates, create a lookup table with two
>>
>> columns: the original date (calc_date) and the date after calc_date (tmw).
>>
>>
>> CREATE TABLE offsets AS
>>
>> WITH RECURSIVE dtoff(calc_date, tmw) AS (
>>
>> WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)
>>
>> SELECT NULL, min(cd) FROM dates
>>
>> UNION
>>
>> SELECT dtoff.tmw, min(dates.cd) FROM dtoff, dates
>>
>> WHERE dates.cd > dtoff.tmw AND dtoff.tmw NOTNULL
>>
>> AND strftime('%%w', dates.cd)<>'0')
>>
>> SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>>
>>
>
> The query below should work.
> I hope that that is not a matter of good luck?
>
> CREATE TABLE securities(calc_date)
> ;
> WITH RECURSIVE dtoff(calc_date, tmw) AS (
>
> WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)
>
> SELECT NULL, min(cd) FROM dates
>
> UNION
>
> SELECT dtoff.tmw,
>         -- begin change 150912
>         (
>     SELECT min(dates.cd)
>     FROM dates
>     WHERE dates.cd > dtoff.tmw
>     AND strftime('%%w', dates.cd)<>'0'
>         )
>         -- end change 150912
> FROM dtoff
> WHERE dtoff.tmw NOTNULL
>     )
> SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>
> ~
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
Aurel Wisse
(514) 814-5421
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Luuk
On 12-09-15 16:36, Aurel Wisse wrote:

> Yes it works, but it takes forever. By comparison:
>
> Original recursive query: 7-8 seconds (SQLite 3.8.10.1)
> Non recursive query (see below): 18-19 seconds. (SQLite 3.8.10.1)
> Proposed query: Not finished after 10 minutes. I am cancelling.
>
> Non recursive query with same result:
>
> CREATE TEMP TABLE dateoff AS
> WITH dates AS (SELECT DISTINCT calc_date FROM wts_next_day
> WHERE strftime('%w', calc_date)<>'0')
> SELECT dt1.calc_date AS calc_date, min(dt2.calc_date) AS tmw
> FROM dates AS dt1, dates AS dt2
> ON dt2.calc_date > dt1.calc_date
> GROUP BY dt1.calc_date;
>
> So the recursive option is still way ahead.
>
>
> On Sat, Sep 12, 2015 at 10:01 AM, E.Pasma <[hidden email]> wrote:
>
>> Op 12 sep 2015, om 13:48 heeft Aurel Wisse het volgende geschreven:
>>
>>>
>>> Here is the example. It doesn't use an actual aggregation (sum, count),
>>> but
>>> the
>>>
>>> "min" aggregate function.
>>>
>>>
>>>  From a list of dates, create a lookup table with two
>>>
>>> columns: the original date (calc_date) and the date after calc_date (tmw).
>>>
>>>
>>> CREATE TABLE offsets AS
>>>
>>> WITH RECURSIVE dtoff(calc_date, tmw) AS (
>>>
>>> WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)
>>>
>>> SELECT NULL, min(cd) FROM dates
>>>
>>> UNION
>>>
>>> SELECT dtoff.tmw, min(dates.cd) FROM dtoff, dates
>>>
>>> WHERE dates.cd > dtoff.tmw AND dtoff.tmw NOTNULL
>>>
>>> AND strftime('%%w', dates.cd)<>'0')
>>>
>>> SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>>>
>>>
>> The query below should work.
>> I hope that that is not a matter of good luck?
>>
>> CREATE TABLE securities(calc_date)
>> ;
>> WITH RECURSIVE dtoff(calc_date, tmw) AS (
>>
>> WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)
>>
>> SELECT NULL, min(cd) FROM dates
>>
>> UNION
>>
>> SELECT dtoff.tmw,
>>          -- begin change 150912
>>          (
>>      SELECT min(dates.cd)
>>      FROM dates
>>      WHERE dates.cd > dtoff.tmw
>>      AND strftime('%%w', dates.cd)<>'0'
>>          )
>>          -- end change 150912
>> FROM dtoff
>> WHERE dtoff.tmw NOTNULL
>>      )
>> SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>>
>> ~
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
What about this one? i cannot test because i do not have (enough)
data..... ;)


WITH RECURSIVE dtoff(calc_date, tmw) AS (
   WITH dates(cd) AS (SELECT DISTINCT calc_date FROM securities)
   SELECT NULL, min(cd) FROM dates

   UNION

   SELECT dtoff.tmw, (dates.cd) FROM dtoff, dates
   WHERE dates.cd > dtoff.tmw AND dtoff.tmw NOTNULL
   AND strftime('%%w', dates.cd)<>'0')

SELECT * from dtoff  d1
WHERE calc_date NOTNULL
   AND (SELECT cOUNT(*) FROM dtoff d2 WHERE d2.calc_date=d1.calc_date
and d2.tmw>d1.tmw)=0
ORDER BY calc_date;
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Richard Hipp-3
In reply to this post by Aurel Wisse
On 9/12/15, Aurel Wisse <[hidden email]> wrote:
> On 9/11/15, Richard Hipp <drh at sqlite.org> wrote:
>
> Here is the example.

Have you considered using code like the following instead:

-- Create a table "securities(calc_date)" and fill it with lots
-- with lots of dates in a crazy order.
--
CREATE TABLE securities(calc_date);
WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<1000)
INSERT INTO securities(calc_date)
   SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0) FROM c;
CREATE INDEX securities_calcdate ON securities(calc_date);


-- Compute the "offsets(calc_date, tmw)" which has one entry for each
-- calc_date in securities and for which tmw is the first non-Sunday date
-- in securities that comes after calc_date.
--
CREATE TABLE offsets AS
  WITH RECURSIVE dtoff(calc_date, tmw) AS (
    SELECT NULL, (SELECT min(calc_date) FROM securities)
    UNION
    SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities
                        WHERE securities.calc_date>dtoff.tmw
                          AND strftime('%w',securities.calc_date)<>'0')
      FROM dtoff
  )
  SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
SELECT * FROM offsets;


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Aurel Wisse
@Luuk : I tried your query and I cancelled after 9 minutes when it wasn't
finished.

@Richard : The query isn't done after 15 minutes. I added a small
modification, reinserting the WITH clause inside the recursive WITH (and
using dates instead of securities in the remainder of the query) :

CREATE TEMP TABLE offsets AS

  WITH RECURSIVE dtoff(calc_date, tmw) AS (

WITH dates AS (SELECT DISTINCT calc_date FROM securities)

    SELECT NULL, (SELECT min(calc_date) FROM dates)
...
This is inevitable as "securities" is a table with 6.2 million rows. This
is the schema:

CREATE TABLE security (calc_date TEXT, security_code INT, mx_full REAL,
mx_adj REAL, mx_std REAL, fu_full REAL, fu_adj REAL, few_full REAL, few_adj
REAL);


There are about 4000 distinct calc_date in securities. I want to avoid an
intermediate table with distinct dates. If I were to use an intermediate
table, the offset would be trivial to create on an ordered version of
dates, self joined with the JOIN clause d1.rowid+1=d2.rowid.


The expected output is (first five rows):

2000-11-30|2000-12-01

2000-12-01|2000-12-04

2000-12-04|2000-12-05

2000-12-05|2000-12-06

2000-12-06|2000-12-07


I have the impression that nontrivial SELECT queries in result columns are
quite expensive.


The fact of the matter is that the original recursive query, up to this
point, is the most efficient solution with the non recursive query second.
Richard: Any chance aggregate recursive queries can be reactivated ?


Thanks,


Aurel



On Sat, Sep 12, 2015 at 12:25 PM, Richard Hipp <[hidden email]> wrote:

> On 9/12/15, Aurel Wisse <[hidden email]> wrote:
> > On 9/11/15, Richard Hipp <drh at sqlite.org> wrote:
> >
> > Here is the example.
>
> Have you considered using code like the following instead:
>
> -- Create a table "securities(calc_date)" and fill it with lots
> -- with lots of dates in a crazy order.
> --
> CREATE TABLE securities(calc_date);
> WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<1000)
> INSERT INTO securities(calc_date)
>    SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0) FROM
> c;
> CREATE INDEX securities_calcdate ON securities(calc_date);
>
>
> -- Compute the "offsets(calc_date, tmw)" which has one entry for each
> -- calc_date in securities and for which tmw is the first non-Sunday date
> -- in securities that comes after calc_date.
> --
> CREATE TABLE offsets AS
>   WITH RECURSIVE dtoff(calc_date, tmw) AS (
>     SELECT NULL, (SELECT min(calc_date) FROM securities)
>     UNION
>     SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities
>                         WHERE securities.calc_date>dtoff.tmw
>                           AND strftime('%w',securities.calc_date)<>'0')
>       FROM dtoff
>   )
>   SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
> SELECT * FROM offsets;
>
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
Aurel Wisse
(514) 814-5421
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Simon Slavin-3

> On 12 Sep 2015, at 9:12pm, Aurel Wisse <[hidden email]> wrote:
>
> CREATE TEMP TABLE offsets AS
>
>  WITH RECURSIVE dtoff(calc_date, tmw) AS (
>
> WITH dates AS (SELECT DISTINCT calc_date FROM securities)
>
>    SELECT NULL, (SELECT min(calc_date) FROM dates)
> ...
> This is inevitable as "securities" is a table with 6.2 million rows. This
> is the schema:
>
> CREATE TABLE security (calc_date TEXT, security_code INT, mx_full REAL,
> mx_adj REAL, mx_std REAL, fu_full REAL, fu_adj REAL, few_full REAL, few_adj
> REAL);
>
>
> There are about 4000 distinct calc_date in securities.

Do you have an index on calc_date ?  If not, make one.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Simon Slavin-3

On 12 Sep 2015, at 9:14pm, Simon Slavin <[hidden email]> wrote:

> Do you have an index on calc_date ?  If not, make one.

Uh ... and then run ANALYZE.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Luuk
In reply to this post by Aurel Wisse


On 12-09-15 22:12, Aurel Wisse wrote:
> @Luuk : I tried your query and I cancelled after 9 minutes when it wasn't
> finished.
>
yes, it was/IS bad...

i should stick to non-recursive queries

select
     s1.calc_date,
     (select min(s2.calc_date)
         from securities s2
         where s2.calc_date>s1.calc_date)
from securities s1
order by s1.calc_date;

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Aurel Wisse
In reply to this post by Simon Slavin-3
@Simon : Yes there is an index. The raw query

CREATE TEMP TABLE uniqdt AS SELECT DISTINCT calc_date FROM securities;

runs exactly 1.2 seconds.

@Luuk : Yes, there is a non recursive solution, but it is slower than the
recursive solution, by a factor of about 2.


On Sat, Sep 12, 2015 at 4:14 PM, Simon Slavin <[hidden email]> wrote:

>
> On 12 Sep 2015, at 9:14pm, Simon Slavin <[hidden email]> wrote:
>
> > Do you have an index on calc_date ?  If not, make one.
>
> Uh ... and then run ANALYZE.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
Aurel Wisse
(514) 814-5421
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Richard Hipp-3
In reply to this post by Aurel Wisse
On 9/12/15, Aurel Wisse <[hidden email]> wrote:
> @Luuk : I tried your query and I cancelled after 9 minutes when it wasn't
> finished.
>
> @Richard : The query isn't done after 15 minutes. I added a small
> modification, reinserting the WITH clause inside the recursive WITH (and
> using dates instead of securities in the remainder of the query) :

Did you create the index on securities(calc_date) first, as shown in
my example.  I should have highlighted that point - the index is
important for performance.

>
> CREATE TEMP TABLE offsets AS
>
>   WITH RECURSIVE dtoff(calc_date, tmw) AS (
>
> WITH dates AS (SELECT DISTINCT calc_date FROM securities)
>
>     SELECT NULL, (SELECT min(calc_date) FROM dates)
> ...
> This is inevitable as "securities" is a table with 6.2 million rows. This
> is the schema:
>
> CREATE TABLE security (calc_date TEXT, security_code INT, mx_full REAL,
> mx_adj REAL, mx_std REAL, fu_full REAL, fu_adj REAL, few_full REAL, few_adj
> REAL);
>
>
> There are about 4000 distinct calc_date in securities. I want to avoid an
> intermediate table with distinct dates. If I were to use an intermediate
> table, the offset would be trivial to create on an ordered version of
> dates, self joined with the JOIN clause d1.rowid+1=d2.rowid.
>
>
> The expected output is (first five rows):
>
> 2000-11-30|2000-12-01
>
> 2000-12-01|2000-12-04
>
> 2000-12-04|2000-12-05
>
> 2000-12-05|2000-12-06
>
> 2000-12-06|2000-12-07
>
>
> I have the impression that nontrivial SELECT queries in result columns are
> quite expensive.
>
>
> The fact of the matter is that the original recursive query, up to this
> point, is the most efficient solution with the non recursive query second.
> Richard: Any chance aggregate recursive queries can be reactivated ?
>
>
> Thanks,
>
>
> Aurel
>
>
>
> On Sat, Sep 12, 2015 at 12:25 PM, Richard Hipp <[hidden email]> wrote:
>
>> On 9/12/15, Aurel Wisse <[hidden email]> wrote:
>> > On 9/11/15, Richard Hipp <drh at sqlite.org> wrote:
>> >
>> > Here is the example.
>>
>> Have you considered using code like the following instead:
>>
>> -- Create a table "securities(calc_date)" and fill it with lots
>> -- with lots of dates in a crazy order.
>> --
>> CREATE TABLE securities(calc_date);
>> WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<1000)
>> INSERT INTO securities(calc_date)
>>    SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0)
>> FROM
>> c;
>> CREATE INDEX securities_calcdate ON securities(calc_date);
>>
>>
>> -- Compute the "offsets(calc_date, tmw)" which has one entry for each
>> -- calc_date in securities and for which tmw is the first non-Sunday date
>> -- in securities that comes after calc_date.
>> --
>> CREATE TABLE offsets AS
>>   WITH RECURSIVE dtoff(calc_date, tmw) AS (
>>     SELECT NULL, (SELECT min(calc_date) FROM securities)
>>     UNION
>>     SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities
>>                         WHERE securities.calc_date>dtoff.tmw
>>                           AND strftime('%w',securities.calc_date)<>'0')
>>       FROM dtoff
>>   )
>>   SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>> SELECT * FROM offsets;
>>
>>
>> --
>> D. Richard Hipp
>> [hidden email]
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> Aurel Wisse
> (514) 814-5421
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Aurel Wisse
There is an index on securities (6.2 million rows). On the dates query from
the with clause,
WITH dates AS (SELECT DISTINCT calc_date FROM securities), there can be no
index.

In your example, there are 1000 rows of indexed dates. The runtime of the
full example (four steps) is:

Run Time: real 0.002 user 0.000484 sys 0.000877

Run Time: real 0.004 user 0.003905 sys 0.000404

Run Time: real 0.001 user 0.001174 sys 0.000259

Run Time: real 0.007 user 0.006640 sys 0.000405

which is instantaneous, but unfortunately not reproducible with my
datasource. Actually, even without the index the execution time of your
example is still very reasonable:

Run Time: real 0.000 user 0.000225 sys 0.000048

Run Time: real 0.004 user 0.003767 sys 0.000181

Run Time: real 0.409 user 0.401034 sys 0.001259


On Sat, Sep 12, 2015 at 4:27 PM, Richard Hipp <[hidden email]> wrote:

> On 9/12/15, Aurel Wisse <[hidden email]> wrote:
> > @Luuk : I tried your query and I cancelled after 9 minutes when it wasn't
> > finished.
> >
> > @Richard : The query isn't done after 15 minutes. I added a small
> > modification, reinserting the WITH clause inside the recursive WITH (and
> > using dates instead of securities in the remainder of the query) :
>
> Did you create the index on securities(calc_date) first, as shown in
> my example.  I should have highlighted that point - the index is
> important for performance.
>
> >
> > CREATE TEMP TABLE offsets AS
> >
> >   WITH RECURSIVE dtoff(calc_date, tmw) AS (
> >
> > WITH dates AS (SELECT DISTINCT calc_date FROM securities)
> >
> >     SELECT NULL, (SELECT min(calc_date) FROM dates)
> > ...
> > This is inevitable as "securities" is a table with 6.2 million rows. This
> > is the schema:
> >
> > CREATE TABLE security (calc_date TEXT, security_code INT, mx_full REAL,
> > mx_adj REAL, mx_std REAL, fu_full REAL, fu_adj REAL, few_full REAL,
> few_adj
> > REAL);
> >
> >
> > There are about 4000 distinct calc_date in securities. I want to avoid an
> > intermediate table with distinct dates. If I were to use an intermediate
> > table, the offset would be trivial to create on an ordered version of
> > dates, self joined with the JOIN clause d1.rowid+1=d2.rowid.
> >
> >
> > The expected output is (first five rows):
> >
> > 2000-11-30|2000-12-01
> >
> > 2000-12-01|2000-12-04
> >
> > 2000-12-04|2000-12-05
> >
> > 2000-12-05|2000-12-06
> >
> > 2000-12-06|2000-12-07
> >
> >
> > I have the impression that nontrivial SELECT queries in result columns
> are
> > quite expensive.
> >
> >
> > The fact of the matter is that the original recursive query, up to this
> > point, is the most efficient solution with the non recursive query
> second.
> > Richard: Any chance aggregate recursive queries can be reactivated ?
> >
> >
> > Thanks,
> >
> >
> > Aurel
> >
> >
> >
> > On Sat, Sep 12, 2015 at 12:25 PM, Richard Hipp <[hidden email]> wrote:
> >
> >> On 9/12/15, Aurel Wisse <[hidden email]> wrote:
> >> > On 9/11/15, Richard Hipp <drh at sqlite.org> wrote:
> >> >
> >> > Here is the example.
> >>
> >> Have you considered using code like the following instead:
> >>
> >> -- Create a table "securities(calc_date)" and fill it with lots
> >> -- with lots of dates in a crazy order.
> >> --
> >> CREATE TABLE securities(calc_date);
> >> WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<1000)
> >> INSERT INTO securities(calc_date)
> >>    SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0)
> >> FROM
> >> c;
> >> CREATE INDEX securities_calcdate ON securities(calc_date);
> >>
> >>
> >> -- Compute the "offsets(calc_date, tmw)" which has one entry for each
> >> -- calc_date in securities and for which tmw is the first non-Sunday
> date
> >> -- in securities that comes after calc_date.
> >> --
> >> CREATE TABLE offsets AS
> >>   WITH RECURSIVE dtoff(calc_date, tmw) AS (
> >>     SELECT NULL, (SELECT min(calc_date) FROM securities)
> >>     UNION
> >>     SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities
> >>                         WHERE securities.calc_date>dtoff.tmw
> >>                           AND strftime('%w',securities.calc_date)<>'0')
> >>       FROM dtoff
> >>   )
> >>   SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
> >> SELECT * FROM offsets;
> >>
> >>
> >> --
> >> D. Richard Hipp
> >> [hidden email]
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [hidden email]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >
> >
> >
> > --
> > Aurel Wisse
> > (514) 814-5421
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
Aurel Wisse
(514) 814-5421
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Aurel Wisse
The fastest solution is actually the temporary table:

CREATE TEMP TABLE uniqdt AS SELECT DISTINCT calc_date FROM securities

WHERE strftime('%w', calc_date)<>'0' ORDER BY calc_date;

CREATE TEMP TABLE dtotemp AS

SELECT ud1.calc_date AS calc_date, ud2.calc_date AS tmw

FROM uniqdt AS ud1 INNER JOIN uniqdt AS ud2

ON ud2.rowid = ud1.rowid+1

ORDER BY calc_date;

DROP TABLE uniqdt;


Execution time:

Run Time: real 3.601 user 3.342178 sys 0.246156

Run Time: real 0.007 user 0.005320 sys 0.001221

Run Time: real 0.000 user 0.000218 sys 0.000047

Still Richard : How about allowing recursive aggregate queries again ?


On Sat, Sep 12, 2015 at 4:45 PM, Aurel Wisse <[hidden email]> wrote:

> There is an index on securities (6.2 million rows). On the dates query
> from the with clause,
> WITH dates AS (SELECT DISTINCT calc_date FROM securities), there can be no
> index.
>
> In your example, there are 1000 rows of indexed dates. The runtime of the
> full example (four steps) is:
>
> Run Time: real 0.002 user 0.000484 sys 0.000877
>
> Run Time: real 0.004 user 0.003905 sys 0.000404
>
> Run Time: real 0.001 user 0.001174 sys 0.000259
>
> Run Time: real 0.007 user 0.006640 sys 0.000405
>
> which is instantaneous, but unfortunately not reproducible with my
> datasource. Actually, even without the index the execution time of your
> example is still very reasonable:
>
> Run Time: real 0.000 user 0.000225 sys 0.000048
>
> Run Time: real 0.004 user 0.003767 sys 0.000181
>
> Run Time: real 0.409 user 0.401034 sys 0.001259
>
>
> On Sat, Sep 12, 2015 at 4:27 PM, Richard Hipp <[hidden email]> wrote:
>
>> On 9/12/15, Aurel Wisse <[hidden email]> wrote:
>> > @Luuk : I tried your query and I cancelled after 9 minutes when it
>> wasn't
>> > finished.
>> >
>> > @Richard : The query isn't done after 15 minutes. I added a small
>> > modification, reinserting the WITH clause inside the recursive WITH (and
>> > using dates instead of securities in the remainder of the query) :
>>
>> Did you create the index on securities(calc_date) first, as shown in
>> my example.  I should have highlighted that point - the index is
>> important for performance.
>>
>> >
>> > CREATE TEMP TABLE offsets AS
>> >
>> >   WITH RECURSIVE dtoff(calc_date, tmw) AS (
>> >
>> > WITH dates AS (SELECT DISTINCT calc_date FROM securities)
>> >
>> >     SELECT NULL, (SELECT min(calc_date) FROM dates)
>> > ...
>> > This is inevitable as "securities" is a table with 6.2 million rows.
>> This
>> > is the schema:
>> >
>> > CREATE TABLE security (calc_date TEXT, security_code INT, mx_full REAL,
>> > mx_adj REAL, mx_std REAL, fu_full REAL, fu_adj REAL, few_full REAL,
>> few_adj
>> > REAL);
>> >
>> >
>> > There are about 4000 distinct calc_date in securities. I want to avoid
>> an
>> > intermediate table with distinct dates. If I were to use an intermediate
>> > table, the offset would be trivial to create on an ordered version of
>> > dates, self joined with the JOIN clause d1.rowid+1=d2.rowid.
>> >
>> >
>> > The expected output is (first five rows):
>> >
>> > 2000-11-30|2000-12-01
>> >
>> > 2000-12-01|2000-12-04
>> >
>> > 2000-12-04|2000-12-05
>> >
>> > 2000-12-05|2000-12-06
>> >
>> > 2000-12-06|2000-12-07
>> >
>> >
>> > I have the impression that nontrivial SELECT queries in result columns
>> are
>> > quite expensive.
>> >
>> >
>> > The fact of the matter is that the original recursive query, up to this
>> > point, is the most efficient solution with the non recursive query
>> second.
>> > Richard: Any chance aggregate recursive queries can be reactivated ?
>> >
>> >
>> > Thanks,
>> >
>> >
>> > Aurel
>> >
>> >
>> >
>> > On Sat, Sep 12, 2015 at 12:25 PM, Richard Hipp <[hidden email]> wrote:
>> >
>> >> On 9/12/15, Aurel Wisse <[hidden email]> wrote:
>> >> > On 9/11/15, Richard Hipp <drh at sqlite.org> wrote:
>> >> >
>> >> > Here is the example.
>> >>
>> >> Have you considered using code like the following instead:
>> >>
>> >> -- Create a table "securities(calc_date)" and fill it with lots
>> >> -- with lots of dates in a crazy order.
>> >> --
>> >> CREATE TABLE securities(calc_date);
>> >> WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<1000)
>> >> INSERT INTO securities(calc_date)
>> >>    SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0)
>> >> FROM
>> >> c;
>> >> CREATE INDEX securities_calcdate ON securities(calc_date);
>> >>
>> >>
>> >> -- Compute the "offsets(calc_date, tmw)" which has one entry for each
>> >> -- calc_date in securities and for which tmw is the first non-Sunday
>> date
>> >> -- in securities that comes after calc_date.
>> >> --
>> >> CREATE TABLE offsets AS
>> >>   WITH RECURSIVE dtoff(calc_date, tmw) AS (
>> >>     SELECT NULL, (SELECT min(calc_date) FROM securities)
>> >>     UNION
>> >>     SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities
>> >>                         WHERE securities.calc_date>dtoff.tmw
>> >>                           AND strftime('%w',securities.calc_date)<>'0')
>> >>       FROM dtoff
>> >>   )
>> >>   SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>> >> SELECT * FROM offsets;
>> >>
>> >>
>> >> --
>> >> D. Richard Hipp
>> >> [hidden email]
>> >> _______________________________________________
>> >> sqlite-users mailing list
>> >> [hidden email]
>> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >>
>> >
>> >
>> >
>> > --
>> > Aurel Wisse
>> > (514) 814-5421
>> > _______________________________________________
>> > sqlite-users mailing list
>> > [hidden email]
>> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >
>>
>>
>> --
>> D. Richard Hipp
>> [hidden email]
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> Aurel Wisse
> (514) 814-5421
>



--
Aurel Wisse
(514) 814-5421
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

R Smith


On 2015-09-12 10:58 PM, Aurel Wisse wrote:
> The fastest solution is actually the temporary table:

Glad you found a faster solution.

> Still Richard : How about allowing recursive aggregate queries again ?

You say "allow" like it's something that worked and they blocked it out
for no good reason.

It didn't work, your use case is a single accidental fringe that
happened to work by virtue of the MIN() operation's character in the
specific query, much in the way that if you throw a handful of seeds on
a pavement and one of them happens to roll all the way over to a mossy
moist area and found it could germinate, it is not a good reason to
promote pavement-seeding - even if you happen to be the observer from
the mossy patch.

Put another way: It isn't good to allow a convention that will "seem" to
do a certain thing or work a certain way, but then give a completely
wrong answer in 99% of cases - all because 1% happens to be correct.

MIN() actually has two implementations, one aggregate and one as a
simple function, might be one could allow MIN() and MAX() if they always
produced the correct results... but then we need test-cases to ensure
they always give correct results, we need to maintain code that caters
for it etc. and all that to achieve a prize which is handled better and
faster by a temp table in the end.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Richard Hipp-3
In reply to this post by Aurel Wisse
On 9/12/15, Aurel Wisse <[hidden email]> wrote:
>
> In your example, there are 1000 rows of indexed dates.
>

It's easy enough to change it to 6300000 or whatever number is closer
to your test case.

I get:

Run Time: real 2.311 user 2.177056 sys 0.131887
Run Time: real 0.164 user 0.159917 sys 0.003997

The first time is for CREATE INDEX and the second is for the CREATE
TABLE offsets.

Test script:

-- Create a table "securities(calc_date)" and fill it with lots
-- with lots of dates in a crazy order.
--
CREATE TABLE securities(calc_date);
WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<6300000)
INSERT INTO securities(calc_date)
   SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0) FROM c;
.timer on
CREATE INDEX securities_calcdate ON securities(calc_date);


-- Compute the "offsets(calc_date, tmw)" which has one entry for each
-- calc_date in securities and for which tmw is the first non-Sunday date
-- in securities that comes after calc_date.
--
CREATE TABLE offsets AS
  WITH RECURSIVE dtoff(calc_date, tmw) AS (
    SELECT NULL, (SELECT min(calc_date) FROM securities)
    UNION
    SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities
                        WHERE securities.calc_date>dtoff.tmw
                          AND strftime('%w',securities.calc_date)<>'0')
      FROM dtoff
  )
  SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;



--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

Aurel Wisse
In reply to this post by R Smith
@R.Smith
I wasn't aware that recursive aggregate queries give wrong answers in 99%
of all general use cases. In that case, it is obviously preferable to
disable it.

On Sat, Sep 12, 2015 at 5:51 PM, R.Smith <[hidden email]> wrote:

>
>
> On 2015-09-12 10:58 PM, Aurel Wisse wrote:
>
>> The fastest solution is actually the temporary table:
>>
>
> Glad you found a faster solution.
>
> Still Richard : How about allowing recursive aggregate queries again ?
>>
>
> You say "allow" like it's something that worked and they blocked it out
> for no good reason.
>
> It didn't work, your use case is a single accidental fringe that happened
> to work by virtue of the MIN() operation's character in the specific query,
> much in the way that if you throw a handful of seeds on a pavement and one
> of them happens to roll all the way over to a mossy moist area and found it
> could germinate, it is not a good reason to promote pavement-seeding - even
> if you happen to be the observer from the mossy patch.
>
> Put another way: It isn't good to allow a convention that will "seem" to
> do a certain thing or work a certain way, but then give a completely wrong
> answer in 99% of cases - all because 1% happens to be correct.
>
> MIN() actually has two implementations, one aggregate and one as a simple
> function, might be one could allow MIN() and MAX() if they always produced
> the correct results... but then we need test-cases to ensure they always
> give correct results, we need to maintain code that caters for it etc. and
> all that to achieve a prize which is handled better and faster by a temp
> table in the end.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



--
Aurel Wisse
(514) 814-5421
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

R Smith


On 2015-09-13 12:40 AM, Aurel Wisse wrote:
> @R.Smith
> I wasn't aware that recursive aggregate queries give wrong answers in 99%
> of all general use cases//...

I apologize - I thought Richard had made that much clear early on in the
exchange and you were simply not recognizing or accepting the fact,
which led to my unnecessarily verbose reply.  (Also, 99% might be an
exaggerated figure, but the principle remains).

Either way, I am happy you have now a query that is even faster to
completion than the original one in dispute. Be sure to post any query
that takes too long, someone here might have a way of speeding it up.

Cheers,
Ryan



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

E.Pasma
>
> On 2015-09-13 12:40 AM, Aurel Wisse wrote:
>> @R.Smith
>> I wasn't aware that recursive aggregate queries give wrong answers  
>> in 99%
>> of all general use cases//...
>
> I apologize - I thought Richard had made that much clear early on in  
> the exchange and you were simply not recognizing or accepting the  
> fact, which led to my unnecessarily verbose reply.  (Also, 99% might  
> be an exaggerated figure, but the principle remains).
>
> Either way, I am happy you have now a query that is even faster to  
> completion than the original one in dispute. Be sure to post any  
> query that takes too long, someone here might have a way of speeding  
> it up.
>
> Cheers,
> Ryan
>
Apart from speed also the results are changed.  Using drh's securities  
table with 1.000 rows, I get the following line counts:

999 - aurel's original version is 3.8.9
1000 - Luuks non-recursive version, same for my own trial
827 -the version with temp table
838 - DRH;s version

And the winner is..



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

E.Pasma
op 13-09-2015 01:45 schreef E.Pasma op [hidden email]:

>>
>> On 2015-09-13 12:40 AM, Aurel Wisse wrote:
>>> @R.Smith
>>> I wasn't aware that recursive aggregate queries give wrong answers
>>> in 99%
>>> of all general use cases//...
>>
>> I apologize - I thought Richard had made that much clear early on in
>> the exchange and you were simply not recognizing or accepting the
>> fact, which led to my unnecessarily verbose reply.  (Also, 99% might
>> be an exaggerated figure, but the principle remains).
>>
>> Either way, I am happy you have now a query that is even faster to
>> completion than the original one in dispute. Be sure to post any
>> query that takes too long, someone here might have a way of speeding
>> it up.
>>
>> Cheers,
>> Ryan
>>
> Apart from speed also the results are changed.  Using drh's securities
> table with 1.000 rows, I get the following line counts:
>
> 999 - aurel's original version is 3.8.9
> 1000 - Luuks non-recursive version, same for my own trial
> 827 -the version with temp table
> 838 - DRH;s version
>
> And the winner is..
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Excuse, it is getting late here:
the temp table version returns just one row less then DRH's version.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Worked in 3.8.9, but now "recursive aggregate queries not supported"

E.Pasma
In reply to this post by R Smith
The rowcounts were:

1) 999 - aurel's original version is 3.8.9
2) 1000 - Luuks non-recursive version, same for my own trial
3) 837 -the version with temp table
4) 838 - DRH;s version

and there is nothing to worry here.
The main difference is explained by a mistake in the format parameter  
to the strftime function. In (1) and (2) that was '%%w' which is  
meaningles. In (4) this is changed to '%w' and that effectively  
filters out the sundays. Thanks DRH for the comments that clarify this  
intention.
The remaining difference is because in (3) the final date is not  
incuded as start date on a seperate line. Still it is already there as  
end date.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12