calculated-value indexes are not covering?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

calculated-value indexes are not covering?

wmertens
Back with more indexing questions :)

12991 $ sqlite3
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table t(j json, s string);
sqlite> create index s on t(s);
sqlite> create index j on t(json_extract(j, '$.foo'));
sqlite> create index l on t(length(s));
sqlite> explain query plan select distinct s from t;
0|0|0|SCAN TABLE t USING COVERING INDEX s
sqlite> explain query plan select distinct json_extract(j, '$.foo') from t;
0|0|0|SCAN TABLE t USING INDEX j
sqlite> explain query plan select distinct length(s) from t;
0|0|0|SCAN TABLE t USING INDEX l

How come the indexes on calculated values are not considered COVERING?
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

Nico Williams
On Wed, Aug 09, 2017 at 06:48:51PM +0000, Wout Mertens wrote:
> sqlite> create table t(j json, s string);
> sqlite> create index s on t(s);
> sqlite> create index j on t(json_extract(j, '$.foo'));
> sqlite> create index l on t(length(s));

In order for any of these indices to be covering indices you need to add
all the columns of the table t to them:

sqlite> create table t(j json, s string);
sqlite> create index s on t(s, j);
sqlite> create index j on t(json_extract(j, '$.foo'), j, s);
sqlite> create index l on t(length(s), s, j);

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

wmertens
but… index s is covering and only includes the field s? I thought a
covering index was one where all the data needed to satisfy the query is in
index? I would say that all the indexes here conform to that definition?

https://sqlite.org/optoverview.html 8.0 covering index

> If, however, all columns that were to be fetched from the table are
already available in the index itself, SQLite will use the values contained
in the index and will never look up the original table row

On Wed, Aug 9, 2017 at 8:55 PM Nico Williams <[hidden email]> wrote:

> On Wed, Aug 09, 2017 at 06:48:51PM +0000, Wout Mertens wrote:
> > sqlite> create table t(j json, s string);
> > sqlite> create index s on t(s);
> > sqlite> create index j on t(json_extract(j, '$.foo'));
> > sqlite> create index l on t(length(s));
>
> In order for any of these indices to be covering indices you need to add
> all the columns of the table t to them:
>
> sqlite> create table t(j json, s string);
> sqlite> create index s on t(s, j);
> sqlite> create index j on t(json_extract(j, '$.foo'), j, s);
> sqlite> create index l on t(length(s), s, j);
>
> Nico
> --
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

David Raymond
There's the issue of whether SQLite takes the value from the index, or recalculates it from the table data. So for a "covering index" you would need to index all the inputs to the function, for example

sqlite> create index lc on t (length(s), s);

sqlite> explain query plan select distinct length(s) from t;
selectid|order|from|detail
0|0|0|SCAN TABLE t USING COVERING INDEX lc


If you look at the explain output you can see that the expression index is still opening the main table in addition to the index. I believe using the actual value stored in the index is either a future optimization, or only works for functions which are explicitly marked as deterministic. Someone else can provide some more light on that.


sqlite> explain select distinct length(s) from t indexed by l;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     13    0                    00  Start at 13
1     Null           1     2     0                    08  r[2]=NULL
2     OpenRead       0     2     0     2              00  root=2 iDb=0; t
3     OpenRead       2     4     0     k(2,,)         00  root=4 iDb=0; l
4     Explain        0     0     0     SCAN TABLE t USING INDEX l  00
5     Rewind         2     12    1     0              00
6       DeferredSeek   2     0     0                    00  Move 0 to 2.rowid if needed
7       Column         2     0     1                    00  r[1]=
8       Eq             1     11    2                    80  if r[2]==r[1] goto 11
9       Copy           1     2     0                    00  r[2]=r[1]
10      ResultRow      1     1     0                    00  output=r[1]
11    Next           2     6     0                    01
12    Halt           0     0     0                    00
13    Transaction    0     0     4     0              01  usesStmtJournal=0
14    Goto           0     1     0                    00
Run Time: real 0.020 user 0.000000 sys 0.000000

sqlite> explain select distinct length(s) from t indexed by lc;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     11    0                    00  Start at 11
1     Null           1     2     0                    08  r[2]=NULL
2     OpenRead       2     5     0     k(3,,,)        00  root=5 iDb=0; lc
3     Explain        0     0     0     SCAN TABLE t USING COVERING INDEX lc  00
4     Rewind         2     10    1     0              00
5       Column         2     0     1                    00  r[1]=
6       Eq             1     9     2                    80  if r[2]==r[1] goto 9
7       Copy           1     2     0                    00  r[2]=r[1]
8       ResultRow      1     1     0                    00  output=r[1]
9     Next           2     5     0                    01
10    Halt           0     0     0                    00
11    Transaction    0     0     4     0              01  usesStmtJournal=0
12    Goto           0     1     0                    00
Run Time: real 0.019 user 0.000000 sys 0.000000

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Wout Mertens
Sent: Wednesday, August 09, 2017 2:59 PM
To: SQLite mailing list
Subject: Re: [sqlite] calculated-value indexes are not covering?

but… index s is covering and only includes the field s? I thought a
covering index was one where all the data needed to satisfy the query is in
index? I would say that all the indexes here conform to that definition?

https://sqlite.org/optoverview.html 8.0 covering index

> If, however, all columns that were to be fetched from the table are
already available in the index itself, SQLite will use the values contained
in the index and will never look up the original table row

On Wed, Aug 9, 2017 at 8:55 PM Nico Williams <[hidden email]> wrote:

> On Wed, Aug 09, 2017 at 06:48:51PM +0000, Wout Mertens wrote:
> > sqlite> create table t(j json, s string);
> > sqlite> create index s on t(s);
> > sqlite> create index j on t(json_extract(j, '$.foo'));
> > sqlite> create index l on t(length(s));
>
> In order for any of these indices to be covering indices you need to add
> all the columns of the table t to them:
>
> sqlite> create table t(j json, s string);
> sqlite> create index s on t(s, j);
> sqlite> create index j on t(json_extract(j, '$.foo'), j, s);
> sqlite> create index l on t(length(s), s, j);
>
> Nico
> --
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

Nico Williams
In reply to this post by wmertens
On Wed, Aug 09, 2017 at 06:59:18PM +0000, Wout Mertens wrote:
> but… index s is covering and only includes the field s? I thought a
> covering index was one where all the data needed to satisfy the query is in
> index? I would say that all the indexes here conform to that definition?

No, "covering" means that the columns listed in the index include all
the columns from the source table that you need for a given query:

  CREATE TABLE t(j TEXT, s TEXT, foo TEXT);
  SELECT s, j FROM t WHERE s = 'foo'; -- full table scan bc [s] is not
                                      -- indexed, is not a PRIMARY KEY,
                                      -- and is not UNIQUE
  CREATE INDEX t1 ON t(s);
  SELECT s FROM t WHERE s = 'foo';    -- uses index; index covers column
                                      -- selection (just [s])

  SELECT s, j FROM t WHERE s = 'foo'; -- full table scan unless [s] is
                                      -- a PRIMARY KEY

  CREATE INDEX t2 ON t(j, s);
  SELECT s, j FROM t WHERE s = 'foo'; -- full table scan; t2 doesn't
                                      -- help because we need a covering
                                      -- index where [s] is a prefix

  CREATE INDEX t3 ON t(s, j);
  SELECT s, j FROM t WHERE s = 'foo'; -- uses covering index t3, finally

  SELECT s, j, foo FROM t WHERE s = 'foo'; -- t3 does not cover -> full
                                           -- table scan

Usually you should have a PRIMARY KEY, and if [s] were one here, then
none of these would need full table scans, but only two of these would
use only an index and not also index the table via the PK.

  -- truly covering index, but only usable in queries by [s] or [s],
  -- [j], or [s], [j], [foo]:
  CREATE INDEX t4 ON t(s, j, foo);

Nico
--
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

wmertens
Nico, I respectfully disagree, if you look at my first post you can see
that the first query does consider that single value index on s covering.
Indeed all the indexes here have all the required data to be covering for
their queries.

As David says, it seems there is a missed optimization opportunity here.
When looking at the EXPLAIN output, the non-covering index version does not
seem to actually use the table value it is copying, but I'm having a hard
time decyphering it.

On large tables, this is the difference between a 4ms search and a 50ms
search…

On Wed, Aug 9, 2017, 9:29 PM Nico Williams <[hidden email]> wrote:

> On Wed, Aug 09, 2017 at 06:59:18PM +0000, Wout Mertens wrote:
> > but… index s is covering and only includes the field s? I thought a
> > covering index was one where all the data needed to satisfy the query is
> in
> > index? I would say that all the indexes here conform to that definition?
>
> No, "covering" means that the columns listed in the index include all
> the columns from the source table that you need for a given query:
>
>   CREATE TABLE t(j TEXT, s TEXT, foo TEXT);
>   SELECT s, j FROM t WHERE s = 'foo'; -- full table scan bc [s] is not
>                                       -- indexed, is not a PRIMARY KEY,
>                                       -- and is not UNIQUE
>   CREATE INDEX t1 ON t(s);
>   SELECT s FROM t WHERE s = 'foo';    -- uses index; index covers column
>                                       -- selection (just [s])
>
>   SELECT s, j FROM t WHERE s = 'foo'; -- full table scan unless [s] is
>                                       -- a PRIMARY KEY
>
>   CREATE INDEX t2 ON t(j, s);
>   SELECT s, j FROM t WHERE s = 'foo'; -- full table scan; t2 doesn't
>                                       -- help because we need a covering
>                                       -- index where [s] is a prefix
>
>   CREATE INDEX t3 ON t(s, j);
>   SELECT s, j FROM t WHERE s = 'foo'; -- uses covering index t3, finally
>
>   SELECT s, j, foo FROM t WHERE s = 'foo'; -- t3 does not cover -> full
>                                            -- table scan
>
> Usually you should have a PRIMARY KEY, and if [s] were one here, then
> none of these would need full table scans, but only two of these would
> use only an index and not also index the table via the PK.
>
>   -- truly covering index, but only usable in queries by [s] or [s],
>   -- [j], or [s], [j], [foo]:
>   CREATE INDEX t4 ON t(s, j, foo);
>
> Nico
> --
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

wmertens
So, am I correct in thinking that an index on expressions already has all
the required data to answer e.g. a SELECT DISTINCT?

If so, that could be an optimization?

Can I request this optimization to be made? :)

Thanks,

Wout.

On Thu, Aug 10, 2017, 7:47 AM Wout Mertens <[hidden email]> wrote:

> Nico, I respectfully disagree, if you look at my first post you can see
> that the first query does consider that single value index on s covering.
> Indeed all the indexes here have all the required data to be covering for
> their queries.
>
> As David says, it seems there is a missed optimization opportunity here.
> When looking at the EXPLAIN output, the non-covering index version does not
> seem to actually use the table value it is copying, but I'm having a hard
> time decyphering it.
>
> On large tables, this is the difference between a 4ms search and a 50ms
> search…
>
> On Wed, Aug 9, 2017, 9:29 PM Nico Williams <[hidden email]> wrote:
>
>> On Wed, Aug 09, 2017 at 06:59:18PM +0000, Wout Mertens wrote:
>> > but… index s is covering and only includes the field s? I thought a
>> > covering index was one where all the data needed to satisfy the query
>> is in
>> > index? I would say that all the indexes here conform to that definition?
>>
>> No, "covering" means that the columns listed in the index include all
>> the columns from the source table that you need for a given query:
>>
>>   CREATE TABLE t(j TEXT, s TEXT, foo TEXT);
>>   SELECT s, j FROM t WHERE s = 'foo'; -- full table scan bc [s] is not
>>                                       -- indexed, is not a PRIMARY KEY,
>>                                       -- and is not UNIQUE
>>   CREATE INDEX t1 ON t(s);
>>   SELECT s FROM t WHERE s = 'foo';    -- uses index; index covers column
>>                                       -- selection (just [s])
>>
>>   SELECT s, j FROM t WHERE s = 'foo'; -- full table scan unless [s] is
>>                                       -- a PRIMARY KEY
>>
>>   CREATE INDEX t2 ON t(j, s);
>>   SELECT s, j FROM t WHERE s = 'foo'; -- full table scan; t2 doesn't
>>                                       -- help because we need a covering
>>                                       -- index where [s] is a prefix
>>
>>   CREATE INDEX t3 ON t(s, j);
>>   SELECT s, j FROM t WHERE s = 'foo'; -- uses covering index t3, finally
>>
>>   SELECT s, j, foo FROM t WHERE s = 'foo'; -- t3 does not cover -> full
>>                                            -- table scan
>>
>> Usually you should have a PRIMARY KEY, and if [s] were one here, then
>> none of these would need full table scans, but only two of these would
>> use only an index and not also index the table via the PK.
>>
>>   -- truly covering index, but only usable in queries by [s] or [s],
>>   -- [j], or [s], [j], [foo]:
>>   CREATE INDEX t4 ON t(s, j, foo);
>>
>> Nico
>> --
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

Richard Hipp-3
On 8/11/17, Wout Mertens <[hidden email]> wrote:
> So, am I correct in thinking that an index on expressions already has all
> the required data to answer e.g. a SELECT DISTINCT?
>
> If so, that could be an optimization?
>
> Can I request this optimization to be made? :)
>

That optimization is already made.  Look at the byte-code:

CREATE TABLE t1(a,b,c);
CREATE INDEX t1x1 ON t1(length(b));
.eqp full
SELECT DISTINCT length(b) FROM t1;

The problem is that when the "Explain" instruction (which gives the
"explain query plan" output) is generated, the query planner does not
yet realize that it can get by with only using the index.  It never
actually uses the original table - it only uses the index - but at the
time that "Explain" is generated, that fact is unknown.  And so the
EXPLAIN QUERY PLAN output is not 100% accurate.

I think fixing that should be low-priority.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

wmertens
Aha ok, great!

Now, forgive me, but there is still a difference in the byte code, and I'm
having a hard time decyphering it. The difference from the non-distinct and
the distinct is:

* The table t1 is opened
* There is an extra Seek in the scanning loop, it looks like it moves the
read pointer of table t1 to the indexed rowid. There are no reads
occurring, but won't the seek cause I/O?

See log below.

sqlite> create index tb on t1(b);
sqlite> select distinct b from t1;
--EQP-- 0,0,0,SCAN TABLE t1 USING COVERING INDEX tb
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     11    0                    00  Start at 11
1     Null           1     2     0                    08  r[2]=NULL
2     OpenRead       2     4     0     k(2,,)         00  root=4 iDb=0; tb
3     Explain        0     0     0     SCAN TABLE t1 USING COVERING INDEX
tb  00
4     Rewind         2     10    1     0              00
5       Column         2     0     1                    00  r[1]=t1.b
6       Eq             1     9     2     (BINARY)       80  if r[2]==r[1]
goto 9
7       Copy           1     2     0                    00  r[2]=r[1]
8       ResultRow      1     1     0                    00  output=r[1]
9     Next           2     5     0                    01
10    Halt           0     0     0                    00
11    Transaction    0     0     3     0              01  usesStmtJournal=0
12    Goto           0     1     0                    00
Error: index tb already exists
sqlite> select distinct length(b) from t1;
--EQP-- 0,0,0,SCAN TABLE t1 USING INDEX t
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     13    0                    00  Start at 13
1     Null           1     2     0                    08  r[2]=NULL
2     OpenRead       0     2     0     2              00  root=2 iDb=0; t1
3     OpenRead       2     3     0     k(2,,)         00  root=3 iDb=0; t
4     Explain        0     0     0     SCAN TABLE t1 USING INDEX t  00
5     Rewind         2     12    1     0              00
6       Seek           2     0     0                    00  Move 0 to
2.rowid
7       Column         2     0     1                    00  r[1]=
8       Eq             1     11    2                    80  if r[2]==r[1]
goto 11
9       Copy           1     2     0                    00  r[2]=r[1]
10      ResultRow      1     1     0                    00  output=r[1]
11    Next           2     6     0                    01
12    Halt           0     0     0                    00
13    Transaction    0     0     3     0              01  usesStmtJournal=0
14    Goto           0     1     0                    00



On Fri, Aug 11, 2017 at 1:31 PM Richard Hipp <[hidden email]> wrote:

> On 8/11/17, Wout Mertens <[hidden email]> wrote:
> > So, am I correct in thinking that an index on expressions already has all
> > the required data to answer e.g. a SELECT DISTINCT?
> >
> > If so, that could be an optimization?
> >
> > Can I request this optimization to be made? :)
> >
>
> That optimization is already made.  Look at the byte-code:
>
> CREATE TABLE t1(a,b,c);
> CREATE INDEX t1x1 ON t1(length(b));
> .eqp full
> SELECT DISTINCT length(b) FROM t1;
>
> The problem is that when the "Explain" instruction (which gives the
> "explain query plan" output) is generated, the query planner does not
> yet realize that it can get by with only using the index.  It never
> actually uses the original table - it only uses the index - but at the
> time that "Explain" is generated, that fact is unknown.  And so the
> EXPLAIN QUERY PLAN output is not 100% accurate.
>
> I think fixing that should be low-priority.
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

Richard Hipp-3
On 8/11/17, Wout Mertens <[hidden email]> wrote:
> Aha ok, great!
>
> Now, forgive me, but there is still a difference in the byte code, and I'm
> having a hard time decyphering it. The difference from the non-distinct and
> the distinct is:
>
> * The table t1 is opened

Yes, there is a little work done to open a cursor on the main table,
but that cursor is never used.

The reason that the cursor is opened is because at the top of the loop
where the cursor needs to be opened, the query planner still does not
realize that it does not actually need the cursor.

> * There is an extra Seek in the scanning loop, it looks like it moves the
> read pointer of table t1 to the indexed rowid. There are no reads
> occurring, but won't the seek cause I/O?

That seek is deferred.  In fact, we changed the name of the opcode to
DeferredSeek in 3.20.0 to avoid confusion.  The actual work of seeking
on the main table cursor is deferred until there is a read request on
that cursor.  And no read request every occurs, and so the seek never
happens.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: calculated-value indexes are not covering?

wmertens
That clears it up, many thanks!

On Fri, Aug 11, 2017 at 2:32 PM Richard Hipp <[hidden email]> wrote:

> On 8/11/17, Wout Mertens <[hidden email]> wrote:
> > Aha ok, great!
> >
> > Now, forgive me, but there is still a difference in the byte code, and
> I'm
> > having a hard time decyphering it. The difference from the non-distinct
> and
> > the distinct is:
> >
> > * The table t1 is opened
>
> Yes, there is a little work done to open a cursor on the main table,
> but that cursor is never used.
>
> The reason that the cursor is opened is because at the top of the loop
> where the cursor needs to be opened, the query planner still does not
> realize that it does not actually need the cursor.
>
> > * There is an extra Seek in the scanning loop, it looks like it moves the
> > read pointer of table t1 to the indexed rowid. There are no reads
> > occurring, but won't the seek cause I/O?
>
> That seek is deferred.  In fact, we changed the name of the opcode to
> DeferredSeek in 3.20.0 to avoid confusion.  The actual work of seeking
> on the main table cursor is deferred until there is a read request on
> that cursor.  And no read request every occurs, and so the seek never
> happens.
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users