Query planner improvements in case of AUTOMATIC INDEX

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Query planner improvements in case of AUTOMATIC INDEX

Rob Golsteijn
Hi List,

When investigating performance of one of our queries I found an interesting situation that might be an opportunity for performance improvement.
Tested with Sqlite version 3.15.2 (November 2016).

Consider the following table and query

CREATE TABLE Node
(
    Id              INTEGER PRIMARY KEY AUTOINCREMENT,
    x               NUMBER(10),
    y               NUMBER(10),
    HeightLevel     NUMBER(2),
    GeomWGS84       BLOB,
    Perm_id         TEXT
    /* some irrelevant fields removed */
);

/* find duplicates */
SELECT NOD1.PERM_ID,
       NOD1.X,
       NOD1.Y,
       NOD1.HeightLevel,
       NOD1.GeomWGS84
  FROM            Node NOD1
       INNER JOIN Node NOD2 ON NOD1.X           = NOD2.X
                           AND NOD1.Y           = NOD2.Y
                           AND NOD1.HeightLevel = NOD2.HeightLevel
                           AND NOD1.GeomWGS84   = NOD2.GeomWGS84
                           AND NOD1.ID         <> NOD2.ID
  ORDER BY NOD1.GeomWGS84;

The query plan of this query is
selectid|order|from|detail
0|0|0|SCAN TABLE Node AS NOD1
0|1|1|SEARCH TABLE Node AS NOD2 USING AUTOMATIC COVERING INDEX (GeomWGS84=? AND HeightLevel=? AND y=? AND x=?)
0|0|0|USE TEMP B-TREE FOR ORDER BY

It takes 636 seconds wall clock time to execute the query.

I would expect that this is the execution time of creating the index + the execution time of the query when the index is already present.
So if I create the "AUTOMATIC" index explicitly the expected total execution time would also be 636 seconds, but

CREATE INDEX idx_node on node (GeomWGS84, HeightLevel, y, x);
Takes 40 seconds and subsequent query execution takes 8 seconds. So 48 seconds in total compared to 636 second with the AUTOMATIC query.

The explanation can be found when looking at the query plan of the query (in the new schema with index present):

selectid|order|from|detail
0|0|0|SCAN TABLE DH_NOD AS NOD1 USING INDEX idx_node
0|1|1|SEARCH TABLE DH_NOD AS NOD2 USING COVERING INDEX idx_node (GeomWGS84=? AND HeightLevel=? AND y=? AND x=?)

So the explicit index is now also used for ORDER BY optimization.

I guess in general it could be used for other optimizations as well .


The optimization possibility is to re-evaluate the query plan, taking also the AUTOMATIC indexes into account, once Sqlite decided that AUTOMATIC indexes are useful.
To avoid extra planning time, maybe this should only be done when AUTOMATICALLY INDEXED table(s) are used multiple times in the query (otherwise they will not change the query plan anyway)?
Since query planning is typically fast compared to query execution, the extra iteration of the query planner may be acceptable for the cases the query plan cannot be improved. For our company it would be acceptable but in general I cannot judge.

Regards,
Rob Golsteijn
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users