SELECT uses index with SUBSTR but UPDATE doesn't

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

SELECT uses index with SUBSTR but UPDATE doesn't

Brannon King
I have this query:
UPDATE nodes SET parent = ? WHERE SUBSTR(name, 0, ?) = ?
EXPLAIN QUERY PLAN tells me that it is going to do a table scan. At the
same time, the query plan for this:
SELECT * FROM nodes WHERE SUBSTR(name, 0, ?) = ?
tells me that it can and will use the (primary key) index on the name
column.

With that info, I thought that this query would be faster:
UPDATE nodes SET parent = ? WHERE name IN (SELECT name FROM nodes WHERE
SUBSTR(name, 0, ?) = ?)
Alas, it's not. I don't know why.

UPDATE will use the index if I use the LIKE operator. However, it won't use
the index if I attempt LIKE (? || '%'). Whatever handles the string
concatenation breaks the use of the index. I don't want to have to sanitize
my own data. I have very arbitrary, user-entered, malicious data going in.
It's also not clear to me what the sanitizer does for the LIKE operator.
What does it do to existing percent signs in the data? I don't want to use
those as wildcards. Hence, I much prefer the SUBSTR approach; it seems much
safer all around.

I run v3.29.0. I hope this can prompt somebody to make the SUBSTR operator
work with the indexes on an UPDATE statement.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT uses index with SUBSTR but UPDATE doesn't

Keith Medcalf

Unable to reproduce.  In particular:

>SELECT * FROM nodes WHERE SUBSTR(name, 0, ?) = ?
>tells me that it can and will use the (primary key) index on the name
>column.

will not use the index.  I can make it use an index by doctoring the table data to make the index scan cheaper than a table scan, but it is still doing a table scan, just doing it by scanning the index because it is cheaper.

SUBSTR(name, 0, ?) is an expression, so unless you have an index on that expression, then an index cannot be used to SEARCH for the rows.  You would have to use a constant for the length, however, not a parameter.  Nonetheless, an index on the entire column may be used to SCAN if scanning the index is cheaper than scanning the table.

>With that info, I thought that this query would be faster:
>UPDATE nodes SET parent = ? WHERE name IN (SELECT name FROM nodes WHERE
>SUBSTR(name, 0, ?) = ?)
>Alas, it's not. I don't know why.

Because you are still having to scan the entire table, whether directly, or by scanning an index that starts with the name column.

Of course, without knowing the definition of the table nodes, it is pretty difficult to know anything at all, since there are quite a lot of dependencies on the declaration of the table (is it a rowid table, a without rowid table, is name the primary key or merely a unique index, and what is the collation of name).

>UPDATE will use the index if I use the LIKE operator. However, it won't
>use the index if I attempt LIKE (? || '%'). Whatever handles the string
>concatenation breaks the use of the index.

Exactly so.  The rules for when an index can be used reveals that an index may be used when the various conditions are met, including the condition that the RHS of the LIKE is a bound parameter or constant text that does not start with a '-' or a number.  An expression is neither a parameter nor constant text so the LIKE optimization will not be attempted.  

https://sqlite.org/optoverview.html#the_like_optimization

You could always simply perform the concatenation in your application and use the like operator everywhere.  Then you could be sure that a range scan of the applicable index would always be used provided that the parameter does not start with a '-' or a number (or a wildcard).

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Brannon King
>Sent: Tuesday, 8 October, 2019 15:53
>To: [hidden email]
>Subject: [sqlite] SELECT uses index with SUBSTR but UPDATE doesn't
>
>I have this query:
>UPDATE nodes SET parent = ? WHERE SUBSTR(name, 0, ?) = ?
>EXPLAIN QUERY PLAN tells me that it is going to do a table scan. At the
>same time, the query plan for this:
>SELECT * FROM nodes WHERE SUBSTR(name, 0, ?) = ?
>tells me that it can and will use the (primary key) index on the name
>column.
>
>With that info, I thought that this query would be faster:
>UPDATE nodes SET parent = ? WHERE name IN (SELECT name FROM nodes WHERE
>SUBSTR(name, 0, ?) = ?)
>Alas, it's not. I don't know why.
>
>UPDATE will use the index if I use the LIKE operator. However, it won't
>use
>the index if I attempt LIKE (? || '%'). Whatever handles the string
>concatenation breaks the use of the index. I don't want to have to
>sanitize
>my own data. I have very arbitrary, user-entered, malicious data going
>in.
>It's also not clear to me what the sanitizer does for the LIKE operator.
>What does it do to existing percent signs in the data? I don't want to
>use
>those as wildcards. Hence, I much prefer the SUBSTR approach; it seems
>much
>safer all around.
>
>I run v3.29.0. I hope this can prompt somebody to make the SUBSTR
>operator
>work with the indexes on an UPDATE statement.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT uses index with SUBSTR but UPDATE doesn't

Jens Alfke-2


> On Oct 9, 2019, at 10:02 AM, Keith Medcalf <[hidden email]> wrote:
>
> SUBSTR(name, 0, ?) is an expression, so unless you have an index on that expression, then an index cannot be used to SEARCH for the rows.

That's accurate in general. However, there _is_ a very similar special-case optimization for the `like` function/operator when it's used to look for a prefix. (With all sorts of caveats having to do with which specific patterns represent prefix matches, edge conditions about data types, restrictions on collation and overloading of 'like'…) So I can see why it would be easy to think that SUBSTR has a similar optimization.

It seems like there's a need for an optimizable string-prefix predicate, i.e. `str BEGINS WITH prefix` or `begins_with(str, prefix)`, to allow the developer to be more explicit about what they mean. Prefix matching is extremely common in some use cases. (One of our customers raised an issue about this a few days ago…)

BETWEEN doesn't work well because it's inclusive, i.e. `BETWEEN 'foo' and 'fop'` doesn't work because it matches 'fop'. Coming up with the upper end of a string prefix match is super annoying — `BETWEEN 'foo' and 'foo\xff' only works until some wise guy adds the key `foo\xff` to the table, and is invalid UTF-8 anyway.

—Jens
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT uses index with SUBSTR but UPDATE doesn't

Keith Medcalf

On Wednesday, 9 October, 2019 12:01, Jens Alfke <[hidden email]> said:

>BETWEEN doesn't work well because it's inclusive, i.e. `BETWEEN 'foo' and
>'fop'` doesn't work because it matches 'fop'. Coming up with the upper
>end of a string prefix match is super annoying — `BETWEEN 'foo' and
>'foo\xff' only works until some wise guy adds the key `foo\xff` to the
>table, and is invalid UTF-8 anyway.

I don't think that the UTF-8 point is meaningful.  However, If *I* were in need of doing this I would use a construct that looks like this:

where (name between :prefix and (:prefix || char(255)) and substr(name, 1, length(:prefix)) collate nocase == :prefix)

where the "name between :prefix and (:prefix || char(255))" constrains the index search and "substr(name, 1, length(:prefix)) collate nocase == :prefix" constrains the resulting candidates.

(Replace all instances of "collate nocase" with the collation you want to use, omit entirely to use the default BINARY collation).

>sqlite
SQLite version 3.30.0 2019-10-09 16:25:22
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table nodes(name text not null collate nocase unique);
sqlite> insert into nodes values ('dangtalk');
sqlite> insert into nodes values ('dingdong');
sqlite> insert into nodes values ('dingwhit');
sqlite> insert into nodes values ('dongdong');
sqlite> .param init
sqlite> .param set :prefix 'DING'
sqlite> .eqp full
sqlite> select * from nodes where name between :prefix and (:prefix || char(255)) and substr(name,1,length(:prefix)) collate nocase == :prefix;
QUERY PLAN
`--SEARCH TABLE nodes USING COVERING INDEX sqlite_autoindex_nodes_1 (name>? AND name<?) (~15360 rows)
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     24    0                    00  Start at 24
1     OpenRead       1     3     0     k(1,NOCASE)    00  root=3 iDb=0; sqlite_autoindex_nodes_1
2     ColumnsUsed    1     0     0     1              00
3     Explain        3     0     0     SEARCH TABLE nodes USING COVERING INDEX sqlite_autoindex_nodes_1 (name>? AND name<?) (~15360 rows)  00
4     Noop           0     0     0                    00  Begin WHERE-loop0: nodes
5     CursorHint     1     0     0     AND(expr,EQ(expr,expr))  00
6     Variable       1     1     0     :prefix        00  r[1]=parameter(1,:prefix)
7     IsNull         1     22    0                    00  if r[1]==NULL goto 22
8     Affinity       1     1     0     B              00  affinity(r[1])
9     SeekGE         1     22    1     1              00  key=r[1]
10    Concat         3     2     1                    00  r[1]=r[2]+r[3]
11    IsNull         1     22    0                    00  if r[1]==NULL goto 22
12    Affinity       1     1     0     B              00  affinity(r[1])
13      IdxGT          1     22    1     1              00  key=r[1]
14      Column         1     0     5                    00  r[5]=nodes.name
15      Function0      6     5     4     substr(3)      03  r[4]=func(r[5..7])
16      Ne             2     21    4     (NOCASE)       50  if r[4]!=r[2] goto 21
17      Noop           0     0     0                    00  Begin WHERE-core
18      Column         1     0     8                    00  r[8]=nodes.name
19      ResultRow      8     1     0                    00  output=r[8]
20      Noop           0     0     0                    00  End WHERE-core
21    Next           1     13    0                    00
22    Noop           0     0     0                    00  End WHERE-loop0: nodes
23    Halt           0     0     0                    00
24    Transaction    0     0     1     0              01  usesStmtJournal=0
25    Variable       1     2     0     :prefix        00  r[2]=parameter(1,:prefix)
26    Integer        255   9     0                    00  r[9]=255
27    Function0      1     9     3     char(-1)       01  r[3]=func(r[9])
28    Integer        1     6     0                    00  r[6]=1
29    Variable       1     10    0     :prefix        00  r[10]=parameter(1,:prefix)
30    Function0      1     10    7     length(1)      01  r[7]=func(r[10])
31    Goto           0     1     0                    00
dingdong
dingwhit

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: SELECT uses index with SUBSTR but UPDATE doesn't

Keith Medcalf

minor correction, char(255) should be x'ff' ... char(255) does not return the byte 0xff, but x'ff' is the byte 0xff ...

where (name between :prefix and (:prefix || x'ff') and substr(name, 1, length(:prefix)) collate nocase == :prefix)

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <[hidden email]> On
>Behalf Of Keith Medcalf
>Sent: Wednesday, 9 October, 2019 13:04
>To: SQLite mailing list <[hidden email]>
>Subject: Re: [sqlite] SELECT uses index with SUBSTR but UPDATE doesn't
>
>
>On Wednesday, 9 October, 2019 12:01, Jens Alfke <[hidden email]>
>said:
>
>>BETWEEN doesn't work well because it's inclusive, i.e. `BETWEEN 'foo'
>and
>>'fop'` doesn't work because it matches 'fop'. Coming up with the upper
>>end of a string prefix match is super annoying — `BETWEEN 'foo' and
>>'foo\xff' only works until some wise guy adds the key `foo\xff` to the
>>table, and is invalid UTF-8 anyway.
>
>I don't think that the UTF-8 point is meaningful.  However, If *I* were
>in need of doing this I would use a construct that looks like this:
>
>where (name between :prefix and (:prefix || char(255)) and substr(name,
>1, length(:prefix)) collate nocase == :prefix)
>
>where the "name between :prefix and (:prefix || char(255))" constrains
>the index search and "substr(name, 1, length(:prefix)) collate nocase ==
>:prefix" constrains the resulting candidates.
>
>(Replace all instances of "collate nocase" with the collation you want to
>use, omit entirely to use the default BINARY collation).
>
>>sqlite
>SQLite version 3.30.0 2019-10-09 16:25:22
>Enter ".help" for usage hints.
>Connected to a transient in-memory database.
>Use ".open FILENAME" to reopen on a persistent database.
>sqlite> create table nodes(name text not null collate nocase unique);
>sqlite> insert into nodes values ('dangtalk');
>sqlite> insert into nodes values ('dingdong');
>sqlite> insert into nodes values ('dingwhit');
>sqlite> insert into nodes values ('dongdong');
>sqlite> .param init
>sqlite> .param set :prefix 'DING'
>sqlite> .eqp full
>sqlite> select * from nodes where name between :prefix and (:prefix ||
>char(255)) and substr(name,1,length(:prefix)) collate nocase == :prefix;
>QUERY PLAN
>`--SEARCH TABLE nodes USING COVERING INDEX sqlite_autoindex_nodes_1
>(name>? AND name<?) (~15360 rows)
>addr  opcode         p1    p2    p3    p4             p5  comment
>----  -------------  ----  ----  ----  -------------  --  -------------
>0     Init           0     24    0                    00  Start at 24
>1     OpenRead       1     3     0     k(1,NOCASE)    00  root=3 iDb=0;
>sqlite_autoindex_nodes_1
>2     ColumnsUsed    1     0     0     1              00
>3     Explain        3     0     0     SEARCH TABLE nodes USING COVERING
>INDEX sqlite_autoindex_nodes_1 (name>? AND name<?) (~15360 rows)  00
>4     Noop           0     0     0                    00  Begin WHERE-
>loop0: nodes
>5     CursorHint     1     0     0     AND(expr,EQ(expr,expr))  00
>6     Variable       1     1     0     :prefix        00
>r[1]=parameter(1,:prefix)
>7     IsNull         1     22    0                    00  if r[1]==NULL
>goto 22
>8     Affinity       1     1     0     B              00  affinity(r[1])
>9     SeekGE         1     22    1     1              00  key=r[1]
>10    Concat         3     2     1                    00  r[1]=r[2]+r[3]
>11    IsNull         1     22    0                    00  if r[1]==NULL
>goto 22
>12    Affinity       1     1     0     B              00  affinity(r[1])
>13      IdxGT          1     22    1     1              00  key=r[1]
>14      Column         1     0     5                    00
>r[5]=nodes.name
>15      Function0      6     5     4     substr(3)      03
>r[4]=func(r[5..7])
>16      Ne             2     21    4     (NOCASE)       50  if r[4]!=r[2]
>goto 21
>17      Noop           0     0     0                    00  Begin WHERE-
>core
>18      Column         1     0     8                    00
>r[8]=nodes.name
>19      ResultRow      8     1     0                    00  output=r[8]
>20      Noop           0     0     0                    00  End WHERE-
>core
>21    Next           1     13    0                    00
>22    Noop           0     0     0                    00  End WHERE-
>loop0: nodes
>23    Halt           0     0     0                    00
>24    Transaction    0     0     1     0              01
>usesStmtJournal=0
>25    Variable       1     2     0     :prefix        00
>r[2]=parameter(1,:prefix)
>26    Integer        255   9     0                    00  r[9]=255
>27    Function0      1     9     3     char(-1)       01  r[3]=func(r[9])
>28    Integer        1     6     0                    00  r[6]=1
>29    Variable       1     10    0     :prefix        00
>r[10]=parameter(1,:prefix)
>30    Function0      1     10    7     length(1)      01
>r[7]=func(r[10])
>31    Goto           0     1     0                    00
>dingdong
>dingwhit
>
>--
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users