Query planner improvements in case of AUTOMATIC INDEX
When investigating performance of one of our queries I found an interesting situation that might be an opportunity for performance improvement.
Tested with Sqlite version 3.15.2 (November 2016).
Consider the following table and query
CREATE TABLE Node
Id INTEGER PRIMARY KEY AUTOINCREMENT,
/* some irrelevant fields removed */
/* find duplicates */
FROM Node NOD1
INNER JOIN Node NOD2 ON NOD1.X = NOD2.X
AND NOD1.Y = NOD2.Y
AND NOD1.HeightLevel = NOD2.HeightLevel
AND NOD1.GeomWGS84 = NOD2.GeomWGS84
AND NOD1.ID <> NOD2.ID
ORDER BY NOD1.GeomWGS84;
The query plan of this query is
0|0|0|SCAN TABLE Node AS NOD1
0|1|1|SEARCH TABLE Node AS NOD2 USING AUTOMATIC COVERING INDEX (GeomWGS84=? AND HeightLevel=? AND y=? AND x=?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
It takes 636 seconds wall clock time to execute the query.
I would expect that this is the execution time of creating the index + the execution time of the query when the index is already present.
So if I create the "AUTOMATIC" index explicitly the expected total execution time would also be 636 seconds, but
CREATE INDEX idx_node on node (GeomWGS84, HeightLevel, y, x);
Takes 40 seconds and subsequent query execution takes 8 seconds. So 48 seconds in total compared to 636 second with the AUTOMATIC query.
The explanation can be found when looking at the query plan of the query (in the new schema with index present):
0|0|0|SCAN TABLE DH_NOD AS NOD1 USING INDEX idx_node
0|1|1|SEARCH TABLE DH_NOD AS NOD2 USING COVERING INDEX idx_node (GeomWGS84=? AND HeightLevel=? AND y=? AND x=?)
So the explicit index is now also used for ORDER BY optimization.
I guess in general it could be used for other optimizations as well .
The optimization possibility is to re-evaluate the query plan, taking also the AUTOMATIC indexes into account, once Sqlite decided that AUTOMATIC indexes are useful.
To avoid extra planning time, maybe this should only be done when AUTOMATICALLY INDEXED table(s) are used multiple times in the query (otherwise they will not change the query plan anyway)?
Since query planning is typically fast compared to query execution, the extra iteration of the query planner may be acceptable for the cases the query plan cannot be improved. For our company it would be acceptable but in general I cannot judge.