Missing several important tricks in partial indexes
If I create a partial index:
create table a (a0, a1)
create index idx on a (a0) where a1 is null;
Then we have several issues:
1) This should be a covering index for
select a0, a1 from a where a1 is null;
It isn't. It's a great index, but we still defer to the table to look up
the (always null) values of a1.
2) Actually, it's slightly worse than that:
select a0, null as a1 from a where a1 is null;
will select the index, because the condition a1 matches, but it seems
that it's STILL not considered a covering index, (guessing now)
presumably because it's not SUFFICIENT that the condition on the query
is a superset of the query on the index.
Including a1 in the index makes it a covering index, despite the fact
that a1 is always null.
I care more about this than most algebraic manipulations because it's a
whole extra order of I/O in a lookup, and it's turning a 1-second query
into a 51-second query, which is now ineligible for use in a web page.
Right now, it's more worthwhile to pay the space penalty for the
additional constant value at the end of the index.
3) The third issue is almost more major than the preceding two, and has
nothing to do with covering, but with partial index selection AT ALL:
If we do a select "WHERE a1 IS ?" and pass a null for the bind-value of
? it will never select the partial index, even though it's eligible.
Re: Missing several important tricks in partial indexes
On 2018-02-27 08:27, Shevek wrote:
> If I create a partial index:
> create table a (a0, a1)
> create index idx on a (a0) where a1 is null;
> 3) The third issue is almost more major than the preceding two, and has
> nothing to do with covering, but with partial index selection AT ALL:
> If we do a select "WHERE a1 IS ?" and pass a null for the bind-value of
> ? it will never select the partial index, even though it's eligible.
This is due to a fact that a code for VDBE is generated before any
parameter substitution takes place. Such code contains a parameter
substitution command (OP_Variable). Said code does not contain two
branches: (1) for param == NULL; (2) for param != NULL.
IMHO, a problem, you are describing, comes from a never-ending problem
of SQL optimization: PREPARE vs. EXECUTION efficiency. It has been
mentioned by DRH a few weeks ago, so it is considered by the team.
There are plenty of things, which can be done by an optimizer. What
about INDEX... WHERE a1 > 0 AND a1 < 100 AND a1 <> 78, and SELECT...
WHERE a1 > 10 AND a1 < 20 OR 10*a1 = 306? Certainly, I do not think that
your example falls into ``making an algebra system'' category, and I am
calm that the previously-mentioned by DRH fact will be considered by the
team while implementing of such optimization.
Being or not NULL is one of the most frequent things which are occurring
in db systems -- thus your wonderment seems to be quite justified.
However it is desirable to take into account that every optimization
speeds up an execution at the price of a preparation (as every index
speeds up a query at the price of an update/insert) -- it is easy to say
``why such and that index is not considered in such and that case?'' --
I have not seen ``my system could speed up due to an elimination of rare
optimizations''. Speeding up is done (too frequently) by adding
consecutive optimizations (on an engine's side) and indices (on an
user's side) without a balance. (Still, abstracting from your particular
justified example of being NULL.) Not every system operates on trillion
records dbs taken from a heaven and executes fancy queries -- there are
also systems which update/insert something and run many relatively
simple queries on a relatively small data.