About the performance of recursive WITH

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

About the performance of recursive WITH

Jean-Luc Hainaut
Hi,

This post concerns a strange (imho) behaviour of recursive WITH update
query as far as execution time is concerned.

I have created (in an "In memory" DB) a table that stores a set of nodes
arranged in a tree structure (actually the skeleton of the contents of a
Windows folder):

    create table GRAPH(Parent int, Child int, Level int);

Column "Level" indicates the depth of the node in the tree. The root
node has Level 0, its children Level 1, and so on.

2690 nodes have been inserted, with "Level" initialized to null:

    insert into GRAPH(Parent,Child) values (null,1); -- root node (no
parent)
    insert into GRAPH(Parent,Child) values (1,9);    -- child of the
root node
    insert into GRAPH(Parent,Child) values (9,10);   -- grand-child of
the root node
    insert into GRAPH(Parent,Child) values (9,11);
    insert into GRAPH(Parent,Child) values (9,12);
    insert into GRAPH(Parent,Child) values (9,13);
    etc.

Considering the size of the table, no indexes have been created.
The distribution of nodes among the levels is fairly "normal":

    +-------+--------+
    | Level | Number |
    +-------+--------+
    | 0     | 1      |
    | 1     | 47     |
    | 2     | 215    |
    | 3     | 638    |
    | 4     | 1010   |
    | 5     | 729    |
    | 6     | 50     |
    +-------+--------+

Now, I would like to compute the "Level" value of all nodes from their
position in the tree. This task immediately suggests a recursive WITH
update:

    with recursive
    HIERARCHY(FromID,ToID,Level) as
       (
       select Parent, Child, '0'
       from   GRAPH where Parent is null
          union all
       select H.ToID, G.Child, H.Level + 1
       from   HIERARCHY H, GRAPH G
       where  H.ToID = G.Parent
       )
    update GRAPH
    set    Level = (select Level from HIERARCHYwhere ToID = GRAPH.Child);

When this query is executed by SQLite 3.16.2, the timer reports an
execution time of 5.522 s.  Adding an index on "Parent" and one on
"Child" just makes things worse (6.381 s.)

I find this figures quite high, so that I try an iterative technique,
which is likely to be close to the execution strategy of the WITH
statement (https://www.sqlite.org/lang_with.html). I translate it for
the CLI shell as follows:

    update GRAPH set Level = 0 where Parent is null;

    update GRAPH set Level = 1
    where  Parent in (select Child from GRAPH where Level = 0);

    update GRAPH set Level = 2
    where  Parent in (select Child from GRAPH where Level = 1);

    update GRAPH set Level = 3
    where  Parent in (select Child from GRAPH where Level = 2);

    update GRAPH set Level = 4
    where  Parent in (select Child from GRAPH where Level = 3);

    update GRAPH set Level = 5
    where  Parent in (select Child from GRAPH where Level = 4);

    update GRAPH set Level = 6
    where  Parent in (select Child from GRAPH where Level = 5);

For this script, I get an execution time of 0.015 s., i.e., nearly 370
times less!

Is there something wrong in my queries? Or is there an optimization
trick for WITH queries by which one could approach the performance of
the iterative version?

The scripts are available here:
https://www.dropbox.com/s/23t4ycftlk0doy1/GRAPH-performance.zip?dl=0

Thanks for any advice

Jean-Luc Hainaut

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of recursive WITH

Keith Medcalf

Hmmmm.  With the current head of trunk ...

I run this and the recursive version takes 93 milliseconds vs the iterative version which takes 203 milliseconds.
Creating all possible covering indexes takes 110 milliseconds vs 141 milliseconds.
Creating only used/required covering indexes takes 93 milliseconds vs 124 milliseconds

Can you add .timer on and .eqp on and see if you are getting reasonable query plans?  Also, what version of SQLite are you using?

Is the version you are using creating the necessary automatic covering indexes?

If I manually create indexes I get better and worse results, as expected.  Those follow after these runs.


>timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
TimeThis :    Start Time :  Thu Feb 16 06:47:14 2017


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
TimeThis :    Start Time :  Thu Feb 16 06:47:14 2017
TimeThis :      End Time :  Thu Feb 16 06:47:14 2017
TimeThis :  Elapsed Time :  00:00:00.078


>dir graph.db
2017-02-16  06:47            45,056 graph.db

>timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
TimeThis :    Start Time :  Thu Feb 16 06:47:58 2017

SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
.eqp on
.timer on
update GRAPH
   set Level = null;
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.016 user 0.000000 sys 0.000000

with
HIERARCHY (FromID,ToID,Level) as
(
    select Parent,
           Child,
           '0'
      from GRAPH
     where Parent is null
 union all
    select H.ToID,
           G.Child,
           H.Level + 1
      from HIERARCHY H,
           GRAPH G
     where H.ToID = G.Parent
)
update GRAPH
   set Level = (select Level
                  from HIERARCHY H
                 where ToID = GRAPH.Child);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
--EQP-- 2,0,0,SCAN TABLE GRAPH
--EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
--EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING AUTOMATIC COVERING INDEX (Parent=?)
--EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
Run Time: real 0.046 user 0.000000 sys 0.015625

select Level,
         count(*) as Number
    from GRAPH
group by Level;
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,USE TEMP B-TREE FOR GROUP BY
0|1
1|47
2|215
3|638
4|1010
5|729
6|50
Run Time: real 0.000 user 0.000000 sys 0.000000


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
TimeThis :    Start Time :  Thu Feb 16 06:47:58 2017
TimeThis :      End Time :  Thu Feb 16 06:47:58 2017
TimeThis :  Elapsed Time :  00:00:00.093


>timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
TimeThis :    Start Time :  Thu Feb 16 06:48:10 2017

SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
.eqp on
.timer on

update GRAPH
   set Level = null;
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.031 user 0.000000 sys 0.015625

update GRAPH
   set Level = 0
 where Parent is null;
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.015 user 0.000000 sys 0.000000

update GRAPH
   set Level = 1
 where Parent in (select Child
                    from GRAPH
                   where Level = 0);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.016 user 0.000000 sys 0.000000

update GRAPH
   set Level = 2
 where Parent in (select Child
                    from GRAPH
                   where Level = 1);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.015 user 0.015625 sys 0.000000

update GRAPH
   set Level = 3
 where Parent in (select Child
                    from GRAPH
                   where Level = 2);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.032 user 0.015625 sys 0.000000

update GRAPH
   set Level = 4
 where Parent in (select Child
                    from GRAPH
                   where Level = 3);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.015 user 0.000000 sys 0.000000

update GRAPH
   set Level = 5
 where Parent in (select Child
                    from GRAPH
                   where Level = 4);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.016 user 0.000000 sys 0.000000

update GRAPH
   set Level = 6
 where Parent in (select Child
                    from GRAPH
                   where Level = 5);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.016 user 0.000000 sys 0.000000

select Level,
         count(*) as Number
    from GRAPH
group by Level;
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,USE TEMP B-TREE FOR GROUP BY
0|1
1|47
2|215
3|638
4|1010
5|729
6|50
Run Time: real 0.015 user 0.000000 sys 0.000000


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
TimeThis :    Start Time :  Thu Feb 16 06:48:10 2017
TimeThis :      End Time :  Thu Feb 16 06:48:10 2017
TimeThis :  Elapsed Time :  00:00:00.203


If I manually add the required covering indexes (actually, I added all possible ones to the end of -Load so the query planner can decide what to do; and also added all the updates in the -Iterative version into a single transaction to more closely reflect that the -recursive version is running only a single update transaction; and put all the inserts in the -Load into a single transaction so that they occur on milliseconds), the results are closer together, but the recursive version still wins at 110 milliseconds vs 141 for the iterative. (The recusive increased due to the requirement to maintain more indexes, and the iterative was faster because it now had indexes to use (but could have been faster without the excess unused indexes)).




>timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
TimeThis :    Start Time :  Thu Feb 16 07:05:55 2017

create table GRAPH
(
    Parent  integer,
    Child   integer,
    Level   integer
);
.echo off
create index Covering1 on Graph (Parent, Child, Level);
create index Covering2 on Graph (Child, Parent, Level);
create index Covering3 on Graph (Level, Parent, Child);
create index Covering4 on Graph (Level, Child, Parent);
analyze;


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
TimeThis :    Start Time :  Thu Feb 16 07:05:55 2017
TimeThis :      End Time :  Thu Feb 16 07:05:55 2017
TimeThis :  Elapsed Time :  00:00:00.140


>timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
TimeThis :    Start Time :  Thu Feb 16 07:06:16 2017

SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
.eqp on
.timer on
update GRAPH
   set Level = null;
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.032 user 0.000000 sys 0.015625

with
HIERARCHY (FromID,ToID,Level) as
(
    select Parent,
           Child,
           '0'
      from GRAPH
     where Parent is null
 union all
    select H.ToID,
           G.Child,
           H.Level + 1
      from HIERARCHY H,
           GRAPH G
     where H.ToID = G.Parent
)
update GRAPH
   set Level = (select Level
                  from HIERARCHY H
                 where ToID = GRAPH.Child);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
--EQP-- 2,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering1 (Parent=?)
--EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
--EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING COVERING INDEX Covering1 (Parent=?)
--EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
Run Time: real 0.047 user 0.015625 sys 0.015625

select Level,
         count(*) as Number
    from GRAPH
group by Level;
--EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
0|1
1|47
2|215
3|638
4|1010
5|729
6|50
Run Time: real 0.000 user 0.000000 sys 0.000000


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
TimeThis :    Start Time :  Thu Feb 16 07:06:16 2017
TimeThis :      End Time :  Thu Feb 16 07:06:16 2017
TimeThis :  Elapsed Time :  00:00:00.110


>timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
TimeThis :    Start Time :  Thu Feb 16 07:06:24 2017

SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
.eqp on
.timer on

update GRAPH
   set Level = null;
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.047 user 0.031250 sys 0.015625

begin;
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 0
 where Parent is null;
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 1
 where Parent in (select Child
                    from GRAPH
                   where Level = 0);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 2
 where Parent in (select Child
                    from GRAPH
                   where Level = 1);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 3
 where Parent in (select Child
                    from GRAPH
                   where Level = 2);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.016 user 0.015625 sys 0.000000

update GRAPH
   set Level = 4
 where Parent in (select Child
                    from GRAPH
                   where Level = 3);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 5
 where Parent in (select Child
                    from GRAPH
                   where Level = 4);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.015 user 0.000000 sys 0.000000

update GRAPH
   set Level = 6
 where Parent in (select Child
                    from GRAPH
                   where Level = 5);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

commit;
Run Time: real 0.016 user 0.000000 sys 0.000000

select Level,
         count(*) as Number
    from GRAPH
group by Level;
--EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
0|1
1|47
2|215
3|638
4|1010
5|729
6|50
Run Time: real 0.015 user 0.015625 sys 0.000000


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
TimeThis :    Start Time :  Thu Feb 16 07:06:24 2017
TimeThis :      End Time :  Thu Feb 16 07:06:24 2017
TimeThis :  Elapsed Time :  00:00:00.141


Creating only the required index (Covering1 and Covering4) results in the changes anticipated, but the recursive version still wins by a smaller margin:  
 93 milliseconds vs 124 milliseconds.

>timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
TimeThis :    Start Time :  Thu Feb 16 07:22:19 2017

create table GRAPH
(
    Parent  integer,
    Child   integer,
    Level   integer
);
.echo off
create index Covering1 on Graph (Parent, Child, Level);
create index Covering4 on Graph (Level, Child, Parent);
analyze;


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
TimeThis :    Start Time :  Thu Feb 16 07:22:19 2017
TimeThis :      End Time :  Thu Feb 16 07:22:19 2017
TimeThis :  Elapsed Time :  00:00:00.109


>timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
TimeThis :    Start Time :  Thu Feb 16 07:22:24 2017

SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
.eqp on
.timer on
update GRAPH
   set Level = null;
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.031 user 0.000000 sys 0.000000

with
HIERARCHY (FromID,ToID,Level) as
(
    select Parent,
           Child,
           '0'
      from GRAPH
     where Parent is null
 union all
    select H.ToID,
           G.Child,
           H.Level + 1
      from HIERARCHY H,
           GRAPH G
     where H.ToID = G.Parent
)
update GRAPH
   set Level = (select Level
                  from HIERARCHY H
                 where ToID = GRAPH.Child);
--EQP-- 0,0,0,SCAN TABLE GRAPH
--EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
--EQP-- 2,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering1 (Parent=?)
--EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
--EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING COVERING INDEX Covering1 (Parent=?)
--EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
Run Time: real 0.032 user 0.000000 sys 0.015625

select Level,
         count(*) as Number
    from GRAPH
group by Level;
--EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
0|1
1|47
2|215
3|638
4|1010
5|729
6|50
Run Time: real 0.000 user 0.000000 sys 0.000000


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
TimeThis :    Start Time :  Thu Feb 16 07:22:24 2017
TimeThis :      End Time :  Thu Feb 16 07:22:24 2017
TimeThis :  Elapsed Time :  00:00:00.093


>timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"

TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
TimeThis :    Start Time :  Thu Feb 16 07:22:28 2017

SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
.eqp on
.timer on

update GRAPH
   set Level = null;
--EQP-- 0,0,0,SCAN TABLE GRAPH
Run Time: real 0.031 user 0.015625 sys 0.000000

begin;
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 0
 where Parent is null;
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 1
 where Parent in (select Child
                    from GRAPH
                   where Level = 0);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 2
 where Parent in (select Child
                    from GRAPH
                   where Level = 1);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 3
 where Parent in (select Child
                    from GRAPH
                   where Level = 2);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 4
 where Parent in (select Child
                    from GRAPH
                   where Level = 3);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.016 user 0.015625 sys 0.000000

update GRAPH
   set Level = 5
 where Parent in (select Child
                    from GRAPH
                   where Level = 4);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

update GRAPH
   set Level = 6
 where Parent in (select Child
                    from GRAPH
                   where Level = 5);
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
--EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
Run Time: real 0.000 user 0.000000 sys 0.000000

commit;
Run Time: real 0.015 user 0.000000 sys 0.000000

select Level,
         count(*) as Number
    from GRAPH
group by Level;
--EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
0|1
1|47
2|215
3|638
4|1010
5|729
6|50
Run Time: real 0.016 user 0.000000 sys 0.000000


TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
TimeThis :    Start Time :  Thu Feb 16 07:22:28 2017
TimeThis :      End Time :  Thu Feb 16 07:22:28 2017
TimeThis :  Elapsed Time :  00:00:00.124

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Jean-Luc Hainaut
> Sent: Thursday, 16 February, 2017 05:57
> To: SQLite mailing list
> Subject: [sqlite] About the performance of recursive WITH
>
> Hi,
>
> This post concerns a strange (imho) behaviour of recursive WITH update
> query as far as execution time is concerned.
>
> I have created (in an "In memory" DB) a table that stores a set of nodes
> arranged in a tree structure (actually the skeleton of the contents of a
> Windows folder):
>
>     create table GRAPH(Parent int, Child int, Level int);
>
> Column "Level" indicates the depth of the node in the tree. The root
> node has Level 0, its children Level 1, and so on.
>
> 2690 nodes have been inserted, with "Level" initialized to null:
>
>     insert into GRAPH(Parent,Child) values (null,1); -- root node (no
> parent)
>     insert into GRAPH(Parent,Child) values (1,9);    -- child of the
> root node
>     insert into GRAPH(Parent,Child) values (9,10);   -- grand-child of
> the root node
>     insert into GRAPH(Parent,Child) values (9,11);
>     insert into GRAPH(Parent,Child) values (9,12);
>     insert into GRAPH(Parent,Child) values (9,13);
>     etc.
>
> Considering the size of the table, no indexes have been created.
> The distribution of nodes among the levels is fairly "normal":
>
>     +-------+--------+
>     | Level | Number |
>     +-------+--------+
>     | 0     | 1      |
>     | 1     | 47     |
>     | 2     | 215    |
>     | 3     | 638    |
>     | 4     | 1010   |
>     | 5     | 729    |
>     | 6     | 50     |
>     +-------+--------+
>
> Now, I would like to compute the "Level" value of all nodes from their
> position in the tree. This task immediately suggests a recursive WITH
> update:
>
>     with recursive
>     HIERARCHY(FromID,ToID,Level) as
>        (
>        select Parent, Child, '0'
>        from   GRAPH where Parent is null
>           union all
>        select H.ToID, G.Child, H.Level + 1
>        from   HIERARCHY H, GRAPH G
>        where  H.ToID = G.Parent
>        )
>     update GRAPH
>     set    Level = (select Level from HIERARCHYwhere ToID = GRAPH.Child);
>
> When this query is executed by SQLite 3.16.2, the timer reports an
> execution time of 5.522 s.  Adding an index on "Parent" and one on
> "Child" just makes things worse (6.381 s.)
>
> I find this figures quite high, so that I try an iterative technique,
> which is likely to be close to the execution strategy of the WITH
> statement (https://www.sqlite.org/lang_with.html). I translate it for
> the CLI shell as follows:
>
>     update GRAPH set Level = 0 where Parent is null;
>
>     update GRAPH set Level = 1
>     where  Parent in (select Child from GRAPH where Level = 0);
>
>     update GRAPH set Level = 2
>     where  Parent in (select Child from GRAPH where Level = 1);
>
>     update GRAPH set Level = 3
>     where  Parent in (select Child from GRAPH where Level = 2);
>
>     update GRAPH set Level = 4
>     where  Parent in (select Child from GRAPH where Level = 3);
>
>     update GRAPH set Level = 5
>     where  Parent in (select Child from GRAPH where Level = 4);
>
>     update GRAPH set Level = 6
>     where  Parent in (select Child from GRAPH where Level = 5);
>
> For this script, I get an execution time of 0.015 s., i.e., nearly 370
> times less!
>
> Is there something wrong in my queries? Or is there an optimization
> trick for WITH queries by which one could approach the performance of
> the iterative version?
>
> The scripts are available here:
> https://www.dropbox.com/s/23t4ycftlk0doy1/GRAPH-performance.zip?dl=0
>
> Thanks for any advice
>
> Jean-Luc Hainaut
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of recursive WITH

Jean-Luc Hainaut

Good news (CTE is a real jewel of SQLite)! Thanks Keith.

My experiments were carried out with v16.2 through the CLI shell but I
got the same results with older versions and with Python programs. Times
were given by ".timer on".
I will analyze the query plans (good exercise, never did this so far)
and play with additional indexes. And perhaps wait for next versions ...

JL

> Hmmmm.  With the current head of trunk ...
>
> I run this and the recursive version takes 93 milliseconds vs the iterative version which takes 203 milliseconds.
> Creating all possible covering indexes takes 110 milliseconds vs 141 milliseconds.
> Creating only used/required covering indexes takes 93 milliseconds vs 124 milliseconds
>
> Can you add .timer on and .eqp on and see if you are getting reasonable query plans?  Also, what version of SQLite are you using?
>
> Is the version you are using creating the necessary automatic covering indexes?
>
> If I manually create indexes I get better and worse results, as expected.  Those follow after these runs.
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:14 2017
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:14 2017
> TimeThis :      End Time :  Thu Feb 16 06:47:14 2017
> TimeThis :  Elapsed Time :  00:00:00.078
>
>
>> dir graph.db
> 2017-02-16  06:47            45,056 graph.db
>
>> timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:58 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> with
> HIERARCHY (FromID,ToID,Level) as
> (
>      select Parent,
>             Child,
>             '0'
>        from GRAPH
>       where Parent is null
>   union all
>      select H.ToID,
>             G.Child,
>             H.Level + 1
>        from HIERARCHY H,
>             GRAPH G
>       where H.ToID = G.Parent
> )
> update GRAPH
>     set Level = (select Level
>                    from HIERARCHY H
>                   where ToID = GRAPH.Child);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
> --EQP-- 2,0,0,SCAN TABLE GRAPH
> --EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
> --EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING AUTOMATIC COVERING INDEX (Parent=?)
> --EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
> --EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
> Run Time: real 0.046 user 0.000000 sys 0.015625
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,USE TEMP B-TREE FOR GROUP BY
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:58 2017
> TimeThis :      End Time :  Thu Feb 16 06:47:58 2017
> TimeThis :  Elapsed Time :  00:00:00.093
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 06:48:10 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
>
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.031 user 0.000000 sys 0.015625
>
> update GRAPH
>     set Level = 0
>   where Parent is null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 1
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 0);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 2
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 1);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.015 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 3
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 2);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.032 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 4
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 3);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 5
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 4);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 6
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 5);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,USE TEMP B-TREE FOR GROUP BY
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 06:48:10 2017
> TimeThis :      End Time :  Thu Feb 16 06:48:10 2017
> TimeThis :  Elapsed Time :  00:00:00.203
>
>
> If I manually add the required covering indexes (actually, I added all possible ones to the end of -Load so the query planner can decide what to do; and also added all the updates in the -Iterative version into a single transaction to more closely reflect that the -recursive version is running only a single update transaction; and put all the inserts in the -Load into a single transaction so that they occur on milliseconds), the results are closer together, but the recursive version still wins at 110 milliseconds vs 141 for the iterative. (The recusive increased due to the requirement to maintain more indexes, and the iterative was faster because it now had indexes to use (but could have been faster without the excess unused indexes)).
>
>
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:05:55 2017
>
> create table GRAPH
> (
>      Parent  integer,
>      Child   integer,
>      Level   integer
> );
> .echo off
> create index Covering1 on Graph (Parent, Child, Level);
> create index Covering2 on Graph (Child, Parent, Level);
> create index Covering3 on Graph (Level, Parent, Child);
> create index Covering4 on Graph (Level, Child, Parent);
> analyze;
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:05:55 2017
> TimeThis :      End Time :  Thu Feb 16 07:05:55 2017
> TimeThis :  Elapsed Time :  00:00:00.140
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:16 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.032 user 0.000000 sys 0.015625
>
> with
> HIERARCHY (FromID,ToID,Level) as
> (
>      select Parent,
>             Child,
>             '0'
>        from GRAPH
>       where Parent is null
>   union all
>      select H.ToID,
>             G.Child,
>             H.Level + 1
>        from HIERARCHY H,
>             GRAPH G
>       where H.ToID = G.Parent
> )
> update GRAPH
>     set Level = (select Level
>                    from HIERARCHY H
>                   where ToID = GRAPH.Child);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
> --EQP-- 2,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
> --EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
> --EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
> Run Time: real 0.047 user 0.015625 sys 0.015625
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:16 2017
> TimeThis :      End Time :  Thu Feb 16 07:06:16 2017
> TimeThis :  Elapsed Time :  00:00:00.110
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:24 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
>
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.047 user 0.031250 sys 0.015625
>
> begin;
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 0
>   where Parent is null;
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 1
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 0);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 2
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 1);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 3
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 2);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.016 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 4
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 3);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 5
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 4);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 6
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 5);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> commit;
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.015 user 0.015625 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:24 2017
> TimeThis :      End Time :  Thu Feb 16 07:06:24 2017
> TimeThis :  Elapsed Time :  00:00:00.141
>
>
> Creating only the required index (Covering1 and Covering4) results in the changes anticipated, but the recursive version still wins by a smaller margin:
>   93 milliseconds vs 124 milliseconds.
>
>> timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:19 2017
>
> create table GRAPH
> (
>      Parent  integer,
>      Child   integer,
>      Level   integer
> );
> .echo off
> create index Covering1 on Graph (Parent, Child, Level);
> create index Covering4 on Graph (Level, Child, Parent);
> analyze;
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:19 2017
> TimeThis :      End Time :  Thu Feb 16 07:22:19 2017
> TimeThis :  Elapsed Time :  00:00:00.109
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:24 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.031 user 0.000000 sys 0.000000
>
> with
> HIERARCHY (FromID,ToID,Level) as
> (
>      select Parent,
>             Child,
>             '0'
>        from GRAPH
>       where Parent is null
>   union all
>      select H.ToID,
>             G.Child,
>             H.Level + 1
>        from HIERARCHY H,
>             GRAPH G
>       where H.ToID = G.Parent
> )
> update GRAPH
>     set Level = (select Level
>                    from HIERARCHY H
>                   where ToID = GRAPH.Child);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
> --EQP-- 2,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
> --EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
> --EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
> Run Time: real 0.032 user 0.000000 sys 0.015625
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:24 2017
> TimeThis :      End Time :  Thu Feb 16 07:22:24 2017
> TimeThis :  Elapsed Time :  00:00:00.093
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:28 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
>
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.031 user 0.015625 sys 0.000000
>
> begin;
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 0
>   where Parent is null;
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 1
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 0);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 2
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 1);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 3
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 2);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 4
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 3);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.016 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 5
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 4);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 6
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 5);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> commit;
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:28 2017
> TimeThis :      End Time :  Thu Feb 16 07:22:28 2017
> TimeThis :  Elapsed Time :  00:00:00.124
>
>> -----Original Message-----
>> From: sqlite-users [mailto:[hidden email]]
>> On Behalf Of Jean-Luc Hainaut
>> Sent: Thursday, 16 February, 2017 05:57
>> To: SQLite mailing list
>> Subject: [sqlite] About the performance of recursive WITH
>>
>> Hi,
>>
>> This post concerns a strange (imho) behaviour of recursive WITH update
>> query as far as execution time is concerned.
>>
>> I have created (in an "In memory" DB) a table that stores a set of nodes
>> arranged in a tree structure (actually the skeleton of the contents of a
>> Windows folder):
>>
>>      create table GRAPH(Parent int, Child int, Level int);
>>
>> Column "Level" indicates the depth of the node in the tree. The root
>> node has Level 0, its children Level 1, and so on.
>>
>> 2690 nodes have been inserted, with "Level" initialized to null:
>>
>>      insert into GRAPH(Parent,Child) values (null,1); -- root node (no
>> parent)
>>      insert into GRAPH(Parent,Child) values (1,9);    -- child of the
>> root node
>>      insert into GRAPH(Parent,Child) values (9,10);   -- grand-child of
>> the root node
>>      insert into GRAPH(Parent,Child) values (9,11);
>>      insert into GRAPH(Parent,Child) values (9,12);
>>      insert into GRAPH(Parent,Child) values (9,13);
>>      etc.
>>
>> Considering the size of the table, no indexes have been created.
>> The distribution of nodes among the levels is fairly "normal":
>>
>>      +-------+--------+
>>      | Level | Number |
>>      +-------+--------+
>>      | 0     | 1      |
>>      | 1     | 47     |
>>      | 2     | 215    |
>>      | 3     | 638    |
>>      | 4     | 1010   |
>>      | 5     | 729    |
>>      | 6     | 50     |
>>      +-------+--------+
>>
>> Now, I would like to compute the "Level" value of all nodes from their
>> position in the tree. This task immediately suggests a recursive WITH
>> update:
>>
>>      with recursive
>>      HIERARCHY(FromID,ToID,Level) as
>>         (
>>         select Parent, Child, '0'
>>         from   GRAPH where Parent is null
>>            union all
>>         select H.ToID, G.Child, H.Level + 1
>>         from   HIERARCHY H, GRAPH G
>>         where  H.ToID = G.Parent
>>         )
>>      update GRAPH
>>      set    Level = (select Level from HIERARCHYwhere ToID = GRAPH.Child);
>>
>> When this query is executed by SQLite 3.16.2, the timer reports an
>> execution time of 5.522 s.  Adding an index on "Parent" and one on
>> "Child" just makes things worse (6.381 s.)
>>
>> I find this figures quite high, so that I try an iterative technique,
>> which is likely to be close to the execution strategy of the WITH
>> statement (https://www.sqlite.org/lang_with.html). I translate it for
>> the CLI shell as follows:
>>
>>      update GRAPH set Level = 0 where Parent is null;
>>
>>      update GRAPH set Level = 1
>>      where  Parent in (select Child from GRAPH where Level = 0);
>>
>>      update GRAPH set Level = 2
>>      where  Parent in (select Child from GRAPH where Level = 1);
>>
>>      update GRAPH set Level = 3
>>      where  Parent in (select Child from GRAPH where Level = 2);
>>
>>      update GRAPH set Level = 4
>>      where  Parent in (select Child from GRAPH where Level = 3);
>>
>>      update GRAPH set Level = 5
>>      where  Parent in (select Child from GRAPH where Level = 4);
>>
>>      update GRAPH set Level = 6
>>      where  Parent in (select Child from GRAPH where Level = 5);
>>
>> For this script, I get an execution time of 0.015 s., i.e., nearly 370
>> times less!
>>
>> Is there something wrong in my queries? Or is there an optimization
>> trick for WITH queries by which one could approach the performance of
>> the iterative version?
>>
>> The scripts are available here:
>> https://www.dropbox.com/s/23t4ycftlk0doy1/GRAPH-performance.zip?dl=0
>>
>> Thanks for any advice
>>
>> Jean-Luc Hainaut
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: About the performance of recursive WITH

Jean-Luc Hainaut
In reply to this post by Keith Medcalf
On 16/02/2017 15:34, Keith Medcalf wrote:
> Hmmmm.  With the current head of trunk ...
>
> I run this and the recursive version takes 93 milliseconds vs the iterative version which takes 203 milliseconds.

Mystery solved.
Now, I get realistic figures with sqlite.dll version 3.17 (resp. 0.012
and 0.015 for recursive and iterative, no manual indexes), which
obviously includes new optimization mechanics for such queries.

Thanks for the time you took for answering!

Jean-Luc

> Creating all possible covering indexes takes 110 milliseconds vs 141 milliseconds.
> Creating only used/required covering indexes takes 93 milliseconds vs 124 milliseconds
>
> Can you add .timer on and .eqp on and see if you are getting reasonable query plans?  Also, what version of SQLite are you using?
>
> Is the version you are using creating the necessary automatic covering indexes?
>
> If I manually create indexes I get better and worse results, as expected.  Those follow after these runs.
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:14 2017
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:14 2017
> TimeThis :      End Time :  Thu Feb 16 06:47:14 2017
> TimeThis :  Elapsed Time :  00:00:00.078
>
>
>> dir graph.db
> 2017-02-16  06:47            45,056 graph.db
>
>> timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:58 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> with
> HIERARCHY (FromID,ToID,Level) as
> (
>      select Parent,
>             Child,
>             '0'
>        from GRAPH
>       where Parent is null
>   union all
>      select H.ToID,
>             G.Child,
>             H.Level + 1
>        from HIERARCHY H,
>             GRAPH G
>       where H.ToID = G.Parent
> )
> update GRAPH
>     set Level = (select Level
>                    from HIERARCHY H
>                   where ToID = GRAPH.Child);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
> --EQP-- 2,0,0,SCAN TABLE GRAPH
> --EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
> --EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING AUTOMATIC COVERING INDEX (Parent=?)
> --EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
> --EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
> Run Time: real 0.046 user 0.000000 sys 0.015625
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,USE TEMP B-TREE FOR GROUP BY
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 06:47:58 2017
> TimeThis :      End Time :  Thu Feb 16 06:47:58 2017
> TimeThis :  Elapsed Time :  00:00:00.093
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 06:48:10 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
>
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.031 user 0.000000 sys 0.015625
>
> update GRAPH
>     set Level = 0
>   where Parent is null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 1
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 0);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 2
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 1);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.015 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 3
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 2);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.032 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 4
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 3);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 5
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 4);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 6
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 5);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,USE TEMP B-TREE FOR GROUP BY
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 06:48:10 2017
> TimeThis :      End Time :  Thu Feb 16 06:48:10 2017
> TimeThis :  Elapsed Time :  00:00:00.203
>
>
> If I manually add the required covering indexes (actually, I added all possible ones to the end of -Load so the query planner can decide what to do; and also added all the updates in the -Iterative version into a single transaction to more closely reflect that the -recursive version is running only a single update transaction; and put all the inserts in the -Load into a single transaction so that they occur on milliseconds), the results are closer together, but the recursive version still wins at 110 milliseconds vs 141 for the iterative. (The recusive increased due to the requirement to maintain more indexes, and the iterative was faster because it now had indexes to use (but could have been faster without the excess unused indexes)).
>
>
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:05:55 2017
>
> create table GRAPH
> (
>      Parent  integer,
>      Child   integer,
>      Level   integer
> );
> .echo off
> create index Covering1 on Graph (Parent, Child, Level);
> create index Covering2 on Graph (Child, Parent, Level);
> create index Covering3 on Graph (Level, Parent, Child);
> create index Covering4 on Graph (Level, Child, Parent);
> analyze;
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:05:55 2017
> TimeThis :      End Time :  Thu Feb 16 07:05:55 2017
> TimeThis :  Elapsed Time :  00:00:00.140
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:16 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.032 user 0.000000 sys 0.015625
>
> with
> HIERARCHY (FromID,ToID,Level) as
> (
>      select Parent,
>             Child,
>             '0'
>        from GRAPH
>       where Parent is null
>   union all
>      select H.ToID,
>             G.Child,
>             H.Level + 1
>        from HIERARCHY H,
>             GRAPH G
>       where H.ToID = G.Parent
> )
> update GRAPH
>     set Level = (select Level
>                    from HIERARCHY H
>                   where ToID = GRAPH.Child);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
> --EQP-- 2,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
> --EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
> --EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
> Run Time: real 0.047 user 0.015625 sys 0.015625
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:16 2017
> TimeThis :      End Time :  Thu Feb 16 07:06:16 2017
> TimeThis :  Elapsed Time :  00:00:00.110
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:24 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
>
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.047 user 0.031250 sys 0.015625
>
> begin;
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 0
>   where Parent is null;
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 1
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 0);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 2
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 1);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 3
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 2);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.016 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 4
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 3);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 5
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 4);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 6
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 5);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> commit;
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.015 user 0.015625 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:06:24 2017
> TimeThis :      End Time :  Thu Feb 16 07:06:24 2017
> TimeThis :  Elapsed Time :  00:00:00.141
>
>
> Creating only the required index (Covering1 and Covering4) results in the changes anticipated, but the recursive version still wins by a smaller margin:
>   93 milliseconds vs 124 milliseconds.
>
>> timethis "sqlite64 graph.db < GRAPH-performance-Load.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:19 2017
>
> create table GRAPH
> (
>      Parent  integer,
>      Child   integer,
>      Level   integer
> );
> .echo off
> create index Covering1 on Graph (Parent, Child, Level);
> create index Covering4 on Graph (Level, Child, Parent);
> analyze;
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-Load.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:19 2017
> TimeThis :      End Time :  Thu Feb 16 07:22:19 2017
> TimeThis :  Elapsed Time :  00:00:00.109
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-recursive.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:24 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.031 user 0.000000 sys 0.000000
>
> with
> HIERARCHY (FromID,ToID,Level) as
> (
>      select Parent,
>             Child,
>             '0'
>        from GRAPH
>       where Parent is null
>   union all
>      select H.ToID,
>             G.Child,
>             H.Level + 1
>        from HIERARCHY H,
>             GRAPH G
>       where H.ToID = G.Parent
> )
> update GRAPH
>     set Level = (select Level
>                    from HIERARCHY H
>                   where ToID = GRAPH.Child);
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> --EQP-- 0,0,0,EXECUTE CORRELATED SCALAR SUBQUERY 0
> --EQP-- 2,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 3,0,0,SCAN TABLE HIERARCHY AS H
> --EQP-- 3,1,1,SEARCH TABLE GRAPH AS G USING COVERING INDEX Covering1 (Parent=?)
> --EQP-- 1,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
> --EQP-- 0,0,0,SEARCH SUBQUERY 1 AS H USING AUTOMATIC COVERING INDEX (ToID=?)
> Run Time: real 0.032 user 0.000000 sys 0.015625
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-recursive.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:24 2017
> TimeThis :      End Time :  Thu Feb 16 07:22:24 2017
> TimeThis :  Elapsed Time :  00:00:00.093
>
>
>> timethis "sqlite64 graph.db < GRAPH-performance-iterative.sql"
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:28 2017
>
> SQLite 3.18.0 2017-02-15 22:36:15 58797e9bafa95709e0f706a15f42f93b409e2db5
> .eqp on
> .timer on
>
> update GRAPH
>     set Level = null;
> --EQP-- 0,0,0,SCAN TABLE GRAPH
> Run Time: real 0.031 user 0.015625 sys 0.000000
>
> begin;
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 0
>   where Parent is null;
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 1
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 0);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 2
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 1);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 3
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 2);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 4
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 3);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.016 user 0.015625 sys 0.000000
>
> update GRAPH
>     set Level = 5
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 4);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> update GRAPH
>     set Level = 6
>   where Parent in (select Child
>                      from GRAPH
>                     where Level = 5);
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING INDEX Covering1 (Parent=?)
> --EQP-- 0,0,0,EXECUTE LIST SUBQUERY 0
> --EQP-- 0,0,0,SEARCH TABLE GRAPH USING COVERING INDEX Covering4 (Level=?)
> Run Time: real 0.000 user 0.000000 sys 0.000000
>
> commit;
> Run Time: real 0.015 user 0.000000 sys 0.000000
>
> select Level,
>           count(*) as Number
>      from GRAPH
> group by Level;
> --EQP-- 0,0,0,SCAN TABLE GRAPH USING COVERING INDEX Covering4
> 0|1
> 1|47
> 2|215
> 3|638
> 4|1010
> 5|729
> 6|50
> Run Time: real 0.016 user 0.000000 sys 0.000000
>
>
> TimeThis :  Command Line :  sqlite64 graph.db < GRAPH-performance-iterative.sql
> TimeThis :    Start Time :  Thu Feb 16 07:22:28 2017
> TimeThis :      End Time :  Thu Feb 16 07:22:28 2017
> TimeThis :  Elapsed Time :  00:00:00.124
>
>> -----Original Message-----
>> From: sqlite-users [mailto:[hidden email]]
>> On Behalf Of Jean-Luc Hainaut
>> Sent: Thursday, 16 February, 2017 05:57
>> To: SQLite mailing list
>> Subject: [sqlite] About the performance of recursive WITH
>>
>> Hi,
>>
>> This post concerns a strange (imho) behaviour of recursive WITH update
>> query as far as execution time is concerned.
>>
>> I have created (in an "In memory" DB) a table that stores a set of nodes
>> arranged in a tree structure (actually the skeleton of the contents of a
>> Windows folder):
>>
>>      create table GRAPH(Parent int, Child int, Level int);
>>
>> Column "Level" indicates the depth of the node in the tree. The root
>> node has Level 0, its children Level 1, and so on.
>>
>> 2690 nodes have been inserted, with "Level" initialized to null:
>>
>>      insert into GRAPH(Parent,Child) values (null,1); -- root node (no
>> parent)
>>      insert into GRAPH(Parent,Child) values (1,9);    -- child of the
>> root node
>>      insert into GRAPH(Parent,Child) values (9,10);   -- grand-child of
>> the root node
>>      insert into GRAPH(Parent,Child) values (9,11);
>>      insert into GRAPH(Parent,Child) values (9,12);
>>      insert into GRAPH(Parent,Child) values (9,13);
>>      etc.
>>
>> Considering the size of the table, no indexes have been created.
>> The distribution of nodes among the levels is fairly "normal":
>>
>>      +-------+--------+
>>      | Level | Number |
>>      +-------+--------+
>>      | 0     | 1      |
>>      | 1     | 47     |
>>      | 2     | 215    |
>>      | 3     | 638    |
>>      | 4     | 1010   |
>>      | 5     | 729    |
>>      | 6     | 50     |
>>      +-------+--------+
>>
>> Now, I would like to compute the "Level" value of all nodes from their
>> position in the tree. This task immediately suggests a recursive WITH
>> update:
>>
>>      with recursive
>>      HIERARCHY(FromID,ToID,Level) as
>>         (
>>         select Parent, Child, '0'
>>         from   GRAPH where Parent is null
>>            union all
>>         select H.ToID, G.Child, H.Level + 1
>>         from   HIERARCHY H, GRAPH G
>>         where  H.ToID = G.Parent
>>         )
>>      update GRAPH
>>      set    Level = (select Level from HIERARCHYwhere ToID = GRAPH.Child);
>>
>> When this query is executed by SQLite 3.16.2, the timer reports an
>> execution time of 5.522 s.  Adding an index on "Parent" and one on
>> "Child" just makes things worse (6.381 s.)
>>
>> I find this figures quite high, so that I try an iterative technique,
>> which is likely to be close to the execution strategy of the WITH
>> statement (https://www.sqlite.org/lang_with.html). I translate it for
>> the CLI shell as follows:
>>
>>      update GRAPH set Level = 0 where Parent is null;
>>
>>      update GRAPH set Level = 1
>>      where  Parent in (select Child from GRAPH where Level = 0);
>>
>>      update GRAPH set Level = 2
>>      where  Parent in (select Child from GRAPH where Level = 1);
>>
>>      update GRAPH set Level = 3
>>      where  Parent in (select Child from GRAPH where Level = 2);
>>
>>      update GRAPH set Level = 4
>>      where  Parent in (select Child from GRAPH where Level = 3);
>>
>>      update GRAPH set Level = 5
>>      where  Parent in (select Child from GRAPH where Level = 4);
>>
>>      update GRAPH set Level = 6
>>      where  Parent in (select Child from GRAPH where Level = 5);
>>
>> For this script, I get an execution time of 0.015 s., i.e., nearly 370
>> times less!
>>
>> Is there something wrong in my queries? Or is there an optimization
>> trick for WITH queries by which one could approach the performance of
>> the iterative version?
>>
>> The scripts are available here:
>> https://www.dropbox.com/s/23t4ycftlk0doy1/GRAPH-performance.zip?dl=0
>>
>> Thanks for any advice
>>
>> Jean-Luc Hainaut
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users