Recursive CTE optimization

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Recursive CTE optimization

Matthias-Christian Ott
I have the following (simplified and generalized) schema and query:

CREATE TABLE nodes (
  id INTEGER NOT NULL PRIMARY KEY
);

CREATE TABLE edges (
  a INTEGER NOT NULL REFERENCES nodes (id),
  b INTEGER NOT NULL REFERENCES nodes (id),
  PRIMARY KEY (a, b)
);

WITH reachable (a, b) AS (
  SELECT a, b FROM edges
  UNION ALL
  SELECT reachable.a AS a, edges.b AS b
  FROM reachable
  INNER JOIN edges ON reachable.b = edges.a
)
SELECT nodes.id, reachable.b
FROM nodes
INNER JOIN reachable ON a = id
WHERE id = 1;

The query plan for the query looks like this:

2|0|0|SCAN TABLE edges
3|0|0|SCAN TABLE reachable
3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 (a=?)
1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
0|1|1|SCAN SUBQUERY 1

The query first computes the reachable relation for all nodes and then
filters out the reachable nodes of node 1 which is clearly inefficient.

If I'm not mistaken the query could be optimized to:

WITH reachable (a, b) AS (
  SELECT a, b FROM edges WHERE a = 1
  UNION ALL
  SELECT reachable.a AS a, edges.b AS b
  FROM reachable
  INNER JOIN edges ON reachable.b = edges.a
)
SELECT nodes.id, reachable.b
FROM nodes
INNER JOIN reachable ON a = id
WHERE id = 1;

The query plan for the optimized query looks like this:

2|0|0|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 (a=?)
3|0|0|SCAN TABLE reachable
3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 (a=?)
1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
0|1|1|SCAN SUBQUERY 1

It looks like the optimization can be generalized because you can pull
the selection into the initial select query of the CTE. Is there a
chance that SQLite could perform this optimization in one of the next
releases?

I could tolerate with repeating the parameter but I want to create a
view out of it, so it's not really an option. Are there any alternatives
if the optimization will not be automated?

Moreover, "COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)" looks
wrong, shouldn't it read "COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE
(UNION)"?

- Matthias-Christian
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Recursive CTE optimization

Richard Hipp-3
On 5/27/15, Matthias-Christian Ott <[hidden email]> wrote:

> It looks like the optimization can be generalized because you can pull
> the selection into the initial select query of the CTE. Is there a
> chance that SQLite could perform this optimization in one of the next
> releases?
>

Are you offering to fund that effort?
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Recursive CTE optimization

Darko Volaric
In reply to this post by Matthias-Christian Ott
I think the difficulty here is that the optimizer is oriented toward the
"low level" and mainly concerned with choosing indexes, processing order
etc  (see https://www.sqlite.org/optoverview.html ) and has sort of a
narrow view of the task.

Approaching it from the other end and breaking down the SQL statement into
a set of constraints over a set of relations linked by those constraints
you can use ideas from constraint programming, such as your optimisation
which is constraint propagation: adding redundant constraints that are
logically implied by the existing constraints; the redundant constraints
don't change the meaning of the query but serve to "prune" the query space
thus reducing the work.

This sort of approach could be implemented as a SQL to SQL transformation
and would complement the existing query planner. Maybe there's a tool out
there already which processes standard SQL in this way?

On Wed, May 27, 2015 at 3:31 AM, Matthias-Christian Ott <[hidden email]>
wrote:

> I have the following (simplified and generalized) schema and query:
>
> CREATE TABLE nodes (
>   id INTEGER NOT NULL PRIMARY KEY
> );
>
> CREATE TABLE edges (
>   a INTEGER NOT NULL REFERENCES nodes (id),
>   b INTEGER NOT NULL REFERENCES nodes (id),
>   PRIMARY KEY (a, b)
> );
>
> WITH reachable (a, b) AS (
>   SELECT a, b FROM edges
>   UNION ALL
>   SELECT reachable.a AS a, edges.b AS b
>   FROM reachable
>   INNER JOIN edges ON reachable.b = edges.a
> )
> SELECT nodes.id, reachable.b
> FROM nodes
> INNER JOIN reachable ON a = id
> WHERE id = 1;
>
> The query plan for the query looks like this:
>
> 2|0|0|SCAN TABLE edges
> 3|0|0|SCAN TABLE reachable
> 3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1
> (a=?)
> 1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
> 0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
> 0|1|1|SCAN SUBQUERY 1
>
> The query first computes the reachable relation for all nodes and then
> filters out the reachable nodes of node 1 which is clearly inefficient.
>
> If I'm not mistaken the query could be optimized to:
>
> WITH reachable (a, b) AS (
>   SELECT a, b FROM edges WHERE a = 1
>   UNION ALL
>   SELECT reachable.a AS a, edges.b AS b
>   FROM reachable
>   INNER JOIN edges ON reachable.b = edges.a
> )
> SELECT nodes.id, reachable.b
> FROM nodes
> INNER JOIN reachable ON a = id
> WHERE id = 1;
>
> The query plan for the optimized query looks like this:
>
> 2|0|0|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1
> (a=?)
> 3|0|0|SCAN TABLE reachable
> 3|1|1|SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1
> (a=?)
> 1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
> 0|0|0|SEARCH TABLE nodes USING INTEGER PRIMARY KEY (rowid=?)
> 0|1|1|SCAN SUBQUERY 1
>
> It looks like the optimization can be generalized because you can pull
> the selection into the initial select query of the CTE. Is there a
> chance that SQLite could perform this optimization in one of the next
> releases?
>
> I could tolerate with repeating the parameter but I want to create a
> view out of it, so it's not really an option. Are there any alternatives
> if the optimization will not be automated?
>
> Moreover, "COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)" looks
> wrong, shouldn't it read "COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE
> (UNION)"?
>
> - Matthias-Christian
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users