Yet Another Why Doesn't Sqlite Use My Index question ...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
Hi,

We've been struggling with a complex query that we have written. One of
the elements of this complex query is a select statement that doesn't
appear to use an index when we *think* it should do. We're not going to
ask you to debug a large complex SQL query (unless you have nothing else
to do today) but we're working our way through the query trying to
understand where we've gone wrong. So we've broken down the query and
are trying each section to see what it performs (or rather doesn't
perform like).

The sub query is a simple select on a large table, Disruptions. The
Disruptions table has 180M rows of data.

The schema for it is here. We've pulled it straight from Navicat for
SQLite.

```
CREATE TABLE "Disruptions" (
         "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
         "version" integer NOT NULL,
         "Disruption_id" INTEGER NOT NULL,
         "status" integer NOT NULL,
         "severity" integer NOT NULL,
         "levelOfInterest" integer NOT NULL,
         "category" integer NOT NULL,
         "subCategory" integer NOT NULL,
         "startTime" TEXT NOT NULL,
         "endTime" text NOT NULL,
         "location" integer NOT NULL,
         "corridor" integer NOT NULL,
         "comments" integer NOT NULL,
         "currentUpdate" integer NOT NULL,
         "remarkTime" TEXT NOT NULL,
         "lastModTime" TEXT NOT NULL,
         "CauseAreaPointX" real NOT NULL,
         "CauseAreaPointY" real NOT NULL,
         "Direction" TEXT
);
INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("Disruptions",
'184626834');

-- ----------------------------
--  Indexes structure for table Disruptions
-- ----------------------------
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC,
"category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE
NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
NOCASE ASC);
CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE
ASC);

PRAGMA foreign_keys = true;
```

As part of the larger more complex query, we are executing the query

```
select * from Disruptions where status = 2 OR status = 6;
```

Status is one of six values, 1 to 6 and is probably not evenly
distributed across the 180M rows.

If we do

```
sqlite> explain query plan select * from Disruptions where status = 2 OR
status = 6;
selectid|order|from|detail
0|0|0|SCAN TABLE Disruptions
```

We can see that table scanning a 180M records is going to be slow, no
matter what the rest of the complex query is like.

We have an index Disruptions_idx4 which we *think* should speed it up,
but the query plan doesn't seem to take this into account.

We think that only having six values of Status means that the speed up
from the index is not going to be fantastic but every little helps.

We have run analyze on the database and that hasn't helped.

Our gut feeling at this moment is that we have the design structure
wrong in our DB and we're going to have to take a long hard look at what
we're doing, do a redesign and a rebuild as we simply got it wrong from
the beginning. Hindsight is wonderful ;) In the interim (as this is a
2-3 month job), we need to speed our query up from 90 mins down to
something in the tens of mins.

Any suggestions very much welcomed,

Thanks

Rob
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Simon Slavin-3

On 17 Mar 2017, at 10:20am, Rob Willett <[hidden email]> wrote:

> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);
>
> […]
>
> As part of the larger more complex query, we are executing the query
>
> ```
> select * from Disruptions where status = 2 OR status = 6;

The schema for the table says that "status" is INTEGER.
You are supplying numbers as arguments.
Those two match and should create no problem.

But your index has a collation order which is usually used for text.  I don’t see that it is obviously wrong, but it does look a little weird.

Try creating another index which is just on "status", without the COLLATE clause.
Then do another ANALYZE, then try the SELECT again.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
Simon,

Thank you for the quick response. We'll do this. We're just waiting for
another analyze to finish. Our current DB is 45GB, the query works OK on
smaller databases, but at this size its very slow, thats why we have a
bad feeling that we're tinkering on the edges and we need a new DB
design :(

The reason for the collation is the way NavCat SQLite works, its
difficult to get it without the collate but we'll do this direct from
the command line.

We'll post back in an hour or so as it'll probably take that long to run
:)

Rob

On 17 Mar 2017, at 10:27, Simon Slavin wrote:

> On 17 Mar 2017, at 10:20am, Rob Willett <[hidden email]>
> wrote:
>
>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
>> NOCASE ASC);
>>
>> […]
>>
>> As part of the larger more complex query, we are executing the query
>>
>> ```
>> select * from Disruptions where status = 2 OR status = 6;
>
> The schema for the table says that "status" is INTEGER.
> You are supplying numbers as arguments.
> Those two match and should create no problem.
>
> But your index has a collation order which is usually used for text.  
> I don’t see that it is obviously wrong, but it does look a little
> weird.
>
> Try creating another index which is just on "status", without the
> COLLATE clause.
> Then do another ANALYZE, then try the SELECT again.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Hick Gunter
In reply to this post by Simon Slavin-3
>On 17 Mar 2017, at 10:20am, Rob Willett <[hidden email]> wrote:
>
>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);
>>
>> […]
>>
>> As part of the larger more complex query, we are executing the query
>>
>> ```
>> select * from Disruptions where status = 2 OR status = 6;
>
>The schema for the table says that "status" is INTEGER.
>You are supplying numbers as arguments.
>Those two match and should create no problem.
>
>But your index has a collation order which is usually used for text.  I don’t see that it is obviously wrong, but it does look a little weird.
>
>Try creating another index which is just on "status", without the COLLATE clause.
>Then do another ANALYZE, then try the SELECT again.
>
>Simon.

If the index is deemed unsuitable by SQLite due to its collation sequence, then I expect it qwould also be ignored in "select ... status=1" (without the second ORed value)

If not, then (select ... where status =2 UNION ALL select where status = 6) should do the trick

Do you really require all the fields from Disruptions?

And yes, collating integers with NOCASE seems quite strange (there are no capital or lowercase numbers unless you are using roman numerals ;) ); for text affinity, it should render the comparison operators caseblind, just like "like".


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Hick Gunter
>On 17 Mar 2017, at 10:20am, Rob Willett <[hidden email]> wrote:
>
>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE ASC);
>>
>> […]
>>
>> As part of the larger more complex query, we are executing the query
>>
>> ```
>> select * from Disruptions where status = 2 OR status = 6;
>
>The schema for the table says that "status" is INTEGER.
>You are supplying numbers as arguments.
>Those two match and should create no problem.
>
>But your index has a collation order which is usually used for text.  I don’t see that it is obviously wrong, but it does look a little weird.
>
>Try creating another index which is just on "status", without the COLLATE clause.
>Then do another ANALYZE, then try the SELECT again.
>
>Simon.

If the index is deemed unsuitable by SQLite due to its collation sequence, then I expect it qwould also be ignored in "select ... status=1" (without the second ORed value)

If not, then (select ... where status =2 UNION ALL select where status = 6) should do the trick

Do you really require all the fields from Disruptions?

And yes, collating integers with NOCASE seems quite strange (there are no capital or lowercase numbers unless you are using roman numerals ;) ); for text affinity, it should render the comparison operators caseblind, just like "like".

PS. Also try "select .. status = 2 COLLATE NOCASE or status = 6 COLLATE NOCASE" just for good measure


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
In reply to this post by Hick Gunter
Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the collation
sequence as a possible issue. We now have a new index and we have just
run the query again

sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 OR
status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think is
good.

If we use your other example

sqlite> explain query plan select * from Disruptions  where status = 2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and
see.

Your last suggestion of "select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically
equivalent to "explain query plan select * from Disruptions where status
= 2 OR status = 6;" now we have removed the collation from the index.

sqlite> explain query plan select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I agree
is not all) of the fields. Following this logic through, does this mean
that it will do more file access bringing the records in from the file
system?

The collation issue seems to be an artifact of the way Navcat for SQLite
works. I suspect we need to be more careful about how we use the tool.

We'll now time the results of each query and run them twice to see the
affect. No idea how long this will take but suspect a few hours :) I
will post back the results as other people may (or may not) find this
helpful.

Thanks

Rob

On 17 Mar 2017, at 10:57, Hick Gunter wrote:

>> On 17 Mar 2017, at 10:20am, Rob Willett
>> <[hidden email]> wrote:
>>
>>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
>>> NOCASE ASC);
>>>
>>> […]
>>>
>>> As part of the larger more complex query, we are executing the query
>>>
>>> ```
>>> select * from Disruptions where status = 2 OR status = 6;
>>
>> The schema for the table says that "status" is INTEGER.
>> You are supplying numbers as arguments.
>> Those two match and should create no problem.
>>
>> But your index has a collation order which is usually used for text.  
>> I don’t see that it is obviously wrong, but it does look a little
>> weird.
>>
>> Try creating another index which is just on "status", without the
>> COLLATE clause.
>> Then do another ANALYZE, then try the SELECT again.
>>
>> Simon.
>
> If the index is deemed unsuitable by SQLite due to its collation
> sequence, then I expect it qwould also be ignored in "select ...
> status=1" (without the second ORed value)
>
> If not, then (select ... where status =2 UNION ALL select where status
> = 6) should do the trick
>
> Do you really require all the fields from Disruptions?
>
> And yes, collating integers with NOCASE seems quite strange (there are
> no capital or lowercase numbers unless you are using roman numerals ;)
> ); for text affinity, it should render the comparison operators
> caseblind, just like "like".
>
>
> ___________________________________________
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: [hidden email]
>
> This communication (including any attachments) is intended for the use
> of the intended recipient(s) only and may contain information that is
> confidential, privileged or legally protected. Any unauthorized use or
> dissemination of this communication is strictly prohibited. If you
> have received this communication in error, please immediately notify
> the sender by return e-mail message and delete all copies of the
> original communication. Thank you for your cooperation.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Hick Gunter
Nerds with chronic byte code affinity like myself would like to see the output of "explain" (without "query plan"), i.e. the SQLite bytecode produced. I guess the query with OR will have a subprogram called once for each status value, whereas I expect the query with UNION ALL to have 2 copies of the search (which would not affect the run time) and maybe even a temporary table of results (which would take longer and use more memory).

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Rob Willett
Gesendet: Freitag, 17. März 2017 12:19
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

Gunter, Simon,

Thanks for the replies, You both seem to be drilling into the collation sequence as a possible issue. We now have a new index and we have just run the query again

sqlite> analyze;
sqlite> drop index Disruptions_idx4;
sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
sqlite> explain query plan select * from Disruptions where status = 2 OR
status = 6;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

So we have a different response from the query planner, which I think is good.

If we use your other example

sqlite> explain query plan select * from Disruptions  where status = 2
UNION ALL select * from Disruptions where status = 6;
selectid|order|from|detail
1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

I'm not sure which query is going to be faster. We'll have to try and see.

Your last suggestion of "select * from Disruptions  where status =2 COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically equivalent to "explain query plan select * from Disruptions where status = 2 OR status = 6;" now we have removed the collation from the index.

sqlite> explain query plan select * from Disruptions  where status =2
COLLATE NOCASE or status = 6 COLLATE NOCASE;
selectid|order|from|detail
0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite>

I'll check if we require all the fields, we require many (which I agree is not all) of the fields. Following this logic through, does this mean that it will do more file access bringing the records in from the file system?

The collation issue seems to be an artifact of the way Navcat for SQLite works. I suspect we need to be more careful about how we use the tool.

We'll now time the results of each query and run them twice to see the affect. No idea how long this will take but suspect a few hours :) I will post back the results as other people may (or may not) find this helpful.

Thanks

Rob

On 17 Mar 2017, at 10:57, Hick Gunter wrote:

>> On 17 Mar 2017, at 10:20am, Rob Willett
>> <[hidden email]> wrote:
>>
>>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
>>> NOCASE ASC);
>>>
>>> […]
>>>
>>> As part of the larger more complex query, we are executing the query
>>>
>>> ```
>>> select * from Disruptions where status = 2 OR status = 6;
>>
>> The schema for the table says that "status" is INTEGER.
>> You are supplying numbers as arguments.
>> Those two match and should create no problem.
>>
>> But your index has a collation order which is usually used for text.
>> I don’t see that it is obviously wrong, but it does look a little
>> weird.
>>
>> Try creating another index which is just on "status", without the
>> COLLATE clause.
>> Then do another ANALYZE, then try the SELECT again.
>>
>> Simon.
>
> If the index is deemed unsuitable by SQLite due to its collation
> sequence, then I expect it qwould also be ignored in "select ...
> status=1" (without the second ORed value)
>
> If not, then (select ... where status =2 UNION ALL select where status
> = 6) should do the trick
>
> Do you really require all the fields from Disruptions?
>
> And yes, collating integers with NOCASE seems quite strange (there are
> no capital or lowercase numbers unless you are using roman numerals ;)
> ); for text affinity, it should render the comparison operators
> caseblind, just like "like".
>
>
> ___________________________________________
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: [hidden email]
>
> This communication (including any attachments) is intended for the use
> of the intended recipient(s) only and may contain information that is
> confidential, privileged or legally protected. Any unauthorized use or
> dissemination of this communication is strictly prohibited. If you
> have received this communication in error, please immediately notify
> the sender by return e-mail message and delete all copies of the
> original communication. Thank you for your cooperation.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
Gunter,

I would never presume to describe anybody as a Nerd!

We're just going back to very first position with the 'bad' collation
index so we can do proper timings as we change things so we understand
the speed up (we hope there is a speed up)

We've written a quick script to check each version. Once we've put the
original index back in, we've added a step to generate the SQLite
bytecode for you. It's the least we can do...

We'll post this when its completed but we suspect it may take most of
the day now :)

echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions
("status" COLLATE NOCASE ASC);'"
echo "explain select * from Disruptions where status = 2 OR status = 6;"
| sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date

echo "-------"

echo "Creating new index without collation"
echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' |
sqlite3 tfl.sqlite
echo "explain select * from Disruptions where status = 2 OR status = 6;"
| sqlite3 tfl.sqlite
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null
date

echo "-------"

echo "Trying SELECT statement with UNION ALL"
echo "explain select * from Disruptions where status = 2 OR status = 6;"
| sqlite3 tfl.sqlite
date
echo "select * from Disruptions  where status = 2 UNION ALL select *
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date
echo "select * from Disruptions  where status = 2 UNION ALL select *
from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
date


On 17 Mar 2017, at 11:30, Hick Gunter wrote:

> Nerds with chronic byte code affinity like myself would like to see
> the output of "explain" (without "query plan"), i.e. the SQLite
> bytecode produced. I guess the query with OR will have a subprogram
> called once for each status value, whereas I expect the query with
> UNION ALL to have 2 copies of the search (which would not affect the
> run time) and maybe even a temporary table of results (which would
> take longer and use more memory).
>
> -----Ursprüngliche Nachricht-----
> Von: sqlite-users
> [mailto:[hidden email]] Im Auftrag von
> Rob Willett
> Gesendet: Freitag, 17. März 2017 12:19
> An: SQLite mailing list <[hidden email]>
> Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index
> question ...
>
> Gunter, Simon,
>
> Thanks for the replies, You both seem to be drilling into the
> collation sequence as a possible issue. We now have a new index and we
> have just run the query again
>
> sqlite> analyze;
> sqlite> drop index Disruptions_idx4;
> sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
> sqlite> explain query plan select * from Disruptions where status = 2
> OR
> status = 6;
> selectid|order|from|detail
> 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
> 0|0|0|EXECUTE LIST SUBQUERY 1
> sqlite>
>
> So we have a different response from the query planner, which I think
> is good.
>
> If we use your other example
>
> sqlite> explain query plan select * from Disruptions  where status = 2
> UNION ALL select * from Disruptions where status = 6;
> selectid|order|from|detail
> 1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
> 2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
> 0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
>
> I'm not sure which query is going to be faster. We'll have to try and
> see.
>
> Your last suggestion of "select * from Disruptions  where status =2
> COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically
> equivalent to "explain query plan select * from Disruptions where
> status = 2 OR status = 6;" now we have removed the collation from the
> index.
>
> sqlite> explain query plan select * from Disruptions  where status =2
> COLLATE NOCASE or status = 6 COLLATE NOCASE;
> selectid|order|from|detail
> 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4 (status=?)
> 0|0|0|EXECUTE LIST SUBQUERY 1
> sqlite>
>
> I'll check if we require all the fields, we require many (which I
> agree is not all) of the fields. Following this logic through, does
> this mean that it will do more file access bringing the records in
> from the file system?
>
> The collation issue seems to be an artifact of the way Navcat for
> SQLite works. I suspect we need to be more careful about how we use
> the tool.
>
> We'll now time the results of each query and run them twice to see the
> affect. No idea how long this will take but suspect a few hours :) I
> will post back the results as other people may (or may not) find this
> helpful.
>
> Thanks
>
> Rob
>
> On 17 Mar 2017, at 10:57, Hick Gunter wrote:
>
>>> On 17 Mar 2017, at 10:20am, Rob Willett
>>> <[hidden email]> wrote:
>>>
>>>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
>>>> NOCASE ASC);
>>>>
>>>> […]
>>>>
>>>> As part of the larger more complex query, we are executing the
>>>> query
>>>>
>>>> ```
>>>> select * from Disruptions where status = 2 OR status = 6;
>>>
>>> The schema for the table says that "status" is INTEGER.
>>> You are supplying numbers as arguments.
>>> Those two match and should create no problem.
>>>
>>> But your index has a collation order which is usually used for text.
>>> I don’t see that it is obviously wrong, but it does look a little
>>> weird.
>>>
>>> Try creating another index which is just on "status", without the
>>> COLLATE clause.
>>> Then do another ANALYZE, then try the SELECT again.
>>>
>>> Simon.
>>
>> If the index is deemed unsuitable by SQLite due to its collation
>> sequence, then I expect it qwould also be ignored in "select ...
>> status=1" (without the second ORed value)
>>
>> If not, then (select ... where status =2 UNION ALL select where
>> status
>> = 6) should do the trick
>>
>> Do you really require all the fields from Disruptions?
>>
>> And yes, collating integers with NOCASE seems quite strange (there
>> are
>> no capital or lowercase numbers unless you are using roman numerals
>> ;)
>> ); for text affinity, it should render the comparison operators
>> caseblind, just like "like".
>>
>>
>> ___________________________________________
>>  Gunter Hick
>> Software Engineer
>> Scientific Games International GmbH
>> FN 157284 a, HG Wien
>> Klitschgasse 2-4, A-1130 Vienna, Austria
>> Tel: +43 1 80100 0
>> E-Mail: [hidden email]
>>
>> This communication (including any attachments) is intended for the
>> use
>> of the intended recipient(s) only and may contain information that is
>> confidential, privileged or legally protected. Any unauthorized use
>> or
>> dissemination of this communication is strictly prohibited. If you
>> have received this communication in error, please immediately notify
>> the sender by return e-mail message and delete all copies of the
>> original communication. Thank you for your cooperation.
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___________________________________________
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: [hidden email]
>
> This communication (including any attachments) is intended for the use
> of the intended recipient(s) only and may contain information that is
> confidential, privileged or legally protected. Any unauthorized use or
> dissemination of this communication is strictly prohibited. If you
> have received this communication in error, please immediately notify
> the sender by return e-mail message and delete all copies of the
> original communication. Thank you for your cooperation.
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
Thanks to everybody for their help earlier today.

As promised here's the results of our various tests. Hopefully they may
be of use to somebody...

We decided to start from a known position and so recreated the original
index with the collation in it. We know this was sub optimal but its our
reference point. We have the bytecode output if anybody wants to see it.

CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE
ASC);

We ran the the following SQL twice

echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null

and the two runs totalled 46 mins. Each was actually 23 mins.

We then dropped the old index, built the new one

echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' |
sqlite3 tfl.sqlite

We ran

echo "select * from Disruptions where status = 2 OR status = 6;" |
sqlite3 tfl.sqlite > /dev/null

twice and each run as 12 mins. So we were twice as quick, which is nice.

We then ran

echo "explain select * from Disruptions where status = 2 UNION ALL
select * from Disruptions where status = 6;" | sqlite3 tfl.sqlite

twice. Each run was around 11.5 mins. We're not going to get into
differences of less than a minute on a run of this size, so we'll say
they are about the same speed.

Interesting results, clearly the collation does make a big difference.
We are now going to go through the schema and check if we have made the
same mistake elsewhere.

Thanks for your help, we can post the bytecode it people are interested.

Rob

On 17 Mar 2017, at 11:41, Rob Willett wrote:

> Gunter,
>
> I would never presume to describe anybody as a Nerd!
>
> We're just going back to very first position with the 'bad' collation
> index so we can do proper timings as we change things so we understand
> the speed up (we hope there is a speed up)
>
> We've written a quick script to check each version. Once we've put the
> original index back in, we've added a step to generate the SQLite
> bytecode for you. It's the least we can do...
>
> We'll post this when its completed but we suspect it may take most of
> the day now :)
>
> echo "Using Index 'CREATE INDEX "Disruptions_idx4" ON Disruptions
> ("status" COLLATE NOCASE ASC);'"
> echo "explain select * from Disruptions where status = 2 OR status =
> 6;" | sqlite3 tfl.sqlite
> date
> echo "select * from Disruptions where status = 2 OR status = 6;" |
> sqlite3 tfl.sqlite > /dev/null
> date
> echo "select * from Disruptions where status = 2 OR status = 6;" |
> sqlite3 tfl.sqlite > /dev/null
> date
>
> echo "-------"
>
> echo "Creating new index without collation"
> echo "drop index Disruptions_idx4;" | sqlite3 tfl.sqlite
> echo 'CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");' |
> sqlite3 tfl.sqlite
> echo "explain select * from Disruptions where status = 2 OR status =
> 6;" | sqlite3 tfl.sqlite
> date
> echo "select * from Disruptions where status = 2 OR status = 6;" |
> sqlite3 tfl.sqlite > /dev/null
> date
> echo "select * from Disruptions where status = 2 OR status = 6;" |
> sqlite3 tfl.sqlite > /dev/null
> date
>
> echo "-------"
>
> echo "Trying SELECT statement with UNION ALL"
> echo "explain select * from Disruptions where status = 2 OR status =
> 6;" | sqlite3 tfl.sqlite
> date
> echo "select * from Disruptions  where status = 2 UNION ALL select *
> from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
> date
> echo "select * from Disruptions  where status = 2 UNION ALL select *
> from Disruptions where status = 6;" | sqlite3 tfl.sqlite > /dev/null
> date
>
>
> On 17 Mar 2017, at 11:30, Hick Gunter wrote:
>
>> Nerds with chronic byte code affinity like myself would like to see
>> the output of "explain" (without "query plan"), i.e. the SQLite
>> bytecode produced. I guess the query with OR will have a subprogram
>> called once for each status value, whereas I expect the query with
>> UNION ALL to have 2 copies of the search (which would not affect the
>> run time) and maybe even a temporary table of results (which would
>> take longer and use more memory).
>>
>> -----Ursprüngliche Nachricht-----
>> Von: sqlite-users
>> [mailto:[hidden email]] Im Auftrag von
>> Rob Willett
>> Gesendet: Freitag, 17. März 2017 12:19
>> An: SQLite mailing list <[hidden email]>
>> Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index
>> question ...
>>
>> Gunter, Simon,
>>
>> Thanks for the replies, You both seem to be drilling into the
>> collation sequence as a possible issue. We now have a new index and
>> we have just run the query again
>>
>> sqlite> analyze;
>> sqlite> drop index Disruptions_idx4;
>> sqlite> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status");
>> sqlite> explain query plan select * from Disruptions where status = 2
>> OR
>> status = 6;
>> selectid|order|from|detail
>> 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
>> (status=?)
>> 0|0|0|EXECUTE LIST SUBQUERY 1
>> sqlite>
>>
>> So we have a different response from the query planner, which I think
>> is good.
>>
>> If we use your other example
>>
>> sqlite> explain query plan select * from Disruptions  where status =
>> 2
>> UNION ALL select * from Disruptions where status = 6;
>> selectid|order|from|detail
>> 1|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
>> (status=?)
>> 2|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
>> (status=?)
>> 0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
>>
>> I'm not sure which query is going to be faster. We'll have to try and
>> see.
>>
>> Your last suggestion of "select * from Disruptions  where status =2
>> COLLATE NOCASE or status = 6 COLLATE NOCASE" appears to be logically
>> equivalent to "explain query plan select * from Disruptions where
>> status = 2 OR status = 6;" now we have removed the collation from the
>> index.
>>
>> sqlite> explain query plan select * from Disruptions  where status =2
>> COLLATE NOCASE or status = 6 COLLATE NOCASE;
>> selectid|order|from|detail
>> 0|0|0|SEARCH TABLE Disruptions USING INDEX Disruptions_idx4
>> (status=?)
>> 0|0|0|EXECUTE LIST SUBQUERY 1
>> sqlite>
>>
>> I'll check if we require all the fields, we require many (which I
>> agree is not all) of the fields. Following this logic through, does
>> this mean that it will do more file access bringing the records in
>> from the file system?
>>
>> The collation issue seems to be an artifact of the way Navcat for
>> SQLite works. I suspect we need to be more careful about how we use
>> the tool.
>>
>> We'll now time the results of each query and run them twice to see
>> the affect. No idea how long this will take but suspect a few hours
>> :) I will post back the results as other people may (or may not) find
>> this helpful.
>>
>> Thanks
>>
>> Rob
>>
>> On 17 Mar 2017, at 10:57, Hick Gunter wrote:
>>
>>>> On 17 Mar 2017, at 10:20am, Rob Willett
>>>> <[hidden email]> wrote:
>>>>
>>>>> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE
>>>>> NOCASE ASC);
>>>>>
>>>>> […]
>>>>>
>>>>> As part of the larger more complex query, we are executing the
>>>>> query
>>>>>
>>>>> ```
>>>>> select * from Disruptions where status = 2 OR status = 6;
>>>>
>>>> The schema for the table says that "status" is INTEGER.
>>>> You are supplying numbers as arguments.
>>>> Those two match and should create no problem.
>>>>
>>>> But your index has a collation order which is usually used for
>>>> text.
>>>> I don’t see that it is obviously wrong, but it does look a little
>>>> weird.
>>>>
>>>> Try creating another index which is just on "status", without the
>>>> COLLATE clause.
>>>> Then do another ANALYZE, then try the SELECT again.
>>>>
>>>> Simon.
>>>
>>> If the index is deemed unsuitable by SQLite due to its collation
>>> sequence, then I expect it qwould also be ignored in "select ...
>>> status=1" (without the second ORed value)
>>>
>>> If not, then (select ... where status =2 UNION ALL select where
>>> status
>>> = 6) should do the trick
>>>
>>> Do you really require all the fields from Disruptions?
>>>
>>> And yes, collating integers with NOCASE seems quite strange (there
>>> are
>>> no capital or lowercase numbers unless you are using roman numerals
>>> ;)
>>> ); for text affinity, it should render the comparison operators
>>> caseblind, just like "like".
>>>
>>>
>>> ___________________________________________
>>>  Gunter Hick
>>> Software Engineer
>>> Scientific Games International GmbH
>>> FN 157284 a, HG Wien
>>> Klitschgasse 2-4, A-1130 Vienna, Austria
>>> Tel: +43 1 80100 0
>>> E-Mail: [hidden email]
>>>
>>> This communication (including any attachments) is intended for the
>>> use
>>> of the intended recipient(s) only and may contain information that
>>> is
>>> confidential, privileged or legally protected. Any unauthorized use
>>> or
>>> dissemination of this communication is strictly prohibited. If you
>>> have received this communication in error, please immediately notify
>>> the sender by return e-mail message and delete all copies of the
>>> original communication. Thank you for your cooperation.
>>>
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> ___________________________________________
>>  Gunter Hick
>> Software Engineer
>> Scientific Games International GmbH
>> FN 157284 a, HG Wien
>> Klitschgasse 2-4, A-1130 Vienna, Austria
>> Tel: +43 1 80100 0
>> E-Mail: [hidden email]
>>
>> This communication (including any attachments) is intended for the
>> use of the intended recipient(s) only and may contain information
>> that is confidential, privileged or legally protected. Any
>> unauthorized use or dissemination of this communication is strictly
>> prohibited. If you have received this communication in error, please
>> immediately notify the sender by return e-mail message and delete all
>> copies of the original communication. Thank you for your cooperation.
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Simon Slavin-3

On 17 Mar 2017, at 5:30pm, Rob Willett <[hidden email]> wrote:

> echo "select * from Disruptions where status = 2 OR status = 6;" | sqlite3 tfl.sqlite > /dev/null
>
> twice and each run as 12 mins. So we were twice as quick, which is nice.

Do you actually need all columns ?  If not, then specifying the columns you need can lead to a further speedup.  It might be enough just to specify the columns you need, but you can achieve further increases in speed by making a covering index.  If speed for this SELECT is sufficiently important to you, and you don’t actually need all columns, post again and we’ll explain further.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
Simon,

We're still benchmarking the various changes we've made during the day.
I'm sitting here watching a tail of a log file waiting for it to finish.
Ah the joys of the internet startup :)

Our plan of action is

1. Have baseline from which we can work from. We now have that.

2. Create the correct sub-query index and work out the (hopefully)
increase in speed.

3. Work through all the remaining indexes and check that we have not
made the same mistake. I know we actually have :( Keep rerunning our
benchmark so we know if we are actually making a difference.

4. Work through returning just the columns we actually need from our
queries. We have a recollection that if we can build an index with all
the information necessary in it, we can do all the work in joins rather
than paging out to disk. Is this what you are referring to?

5. Sleep (not exactly sure when) and watch three international rugby
games tomorrow.

Rob

On 17 Mar 2017, at 18:15, Simon Slavin wrote:

> On 17 Mar 2017, at 5:30pm, Rob Willett <[hidden email]>
> wrote:
>
>> echo "select * from Disruptions where status = 2 OR status = 6;" |
>> sqlite3 tfl.sqlite > /dev/null
>>
>> twice and each run as 12 mins. So we were twice as quick, which is
>> nice.
>
> Do you actually need all columns ?  If not, then specifying the
> columns you need can lead to a further speedup.  It might be enough
> just to specify the columns you need, but you can achieve further
> increases in speed by making a covering index.  If speed for this
> SELECT is sufficiently important to you, and you don’t actually need
> all columns, post again and we’ll explain further.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Hick Gunter
Ad 4) not quite,but close. If the index used for a join also contains all the other fields of that table that are referenced in the query, SQLite can use those values to avoid reading in the corresponding table row. This saves memory (no storage for table row consumed), CPU cycles (no going through another BTree structure) and disk IO (no access to the tables' pages).

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Rob Willett
Gesendet: Freitag, 17. März 2017 19:22
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...

Simon,

We're still benchmarking the various changes we've made during the day.
I'm sitting here watching a tail of a log file waiting for it to finish.
Ah the joys of the internet startup :)

Our plan of action is

1. Have baseline from which we can work from. We now have that.

2. Create the correct sub-query index and work out the (hopefully) increase in speed.

3. Work through all the remaining indexes and check that we have not made the same mistake. I know we actually have :( Keep rerunning our benchmark so we know if we are actually making a difference.

4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to?

5. Sleep (not exactly sure when) and watch three international rugby games tomorrow.

Rob

On 17 Mar 2017, at 18:15, Simon Slavin wrote:

> On 17 Mar 2017, at 5:30pm, Rob Willett <[hidden email]>
> wrote:
>
>> echo "select * from Disruptions where status = 2 OR status = 6;" |
>> sqlite3 tfl.sqlite > /dev/null
>>
>> twice and each run as 12 mins. So we were twice as quick, which is
>> nice.
>
> Do you actually need all columns ?  If not, then specifying the
> columns you need can lead to a further speedup.  It might be enough
> just to specify the columns you need, but you can achieve further
> increases in speed by making a covering index.  If speed for this
> SELECT is sufficiently important to you, and you don’t actually need
> all columns, post again and we’ll explain further.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Simon Slavin-3
In reply to this post by Rob Willett

On 17 Mar 2017, at 6:22pm, Rob Willett <[hidden email]> wrote:

> 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to?

It works only where all the columns you need to read are in the same table.  The ideal form of a covering index is to have the columns listed in this order:

1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all.  This can lead to something like a doubling of speed.  Of course, you sacrifice filespace, and making changes to the table takes a little longer.

> 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow.

Sleep while waiting for indexes to be created and ANALYZE to work.  May you see skilled players, creative moves and dramatic play.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

David Raymond
The data for each record is stored in the same order as the fields are defined. When reading a record's information, SQLite starts at the first field, and stops at the last field needed to complete the query. The situation where this makes the most difference is where there's a reasonably sized blob field, those should always be the final fields in the table's schema. If you have an important field after the large blob, SQLite has to load through the whole blob to get to that field, including going through any needed overflow pages (which are a linked list). If your query doesn't need that extra information, then SQLite can stop and not bother reading all that extra data. Since SQLite only reads whole pages at a time from disk though, that's mostly only relevant when you've got a lot of fields or large ones which result in overflow pages being needed.

The not loading more fields than needed is in part why you see a lot of sub queries that have "select 1 from", usually along the lines of
"...where exists(select 1 from otherTable where...)..."
By using the constant of 1 you avoid artificially making SQLite grab any more fields than what it needs for the where clause.


Of course, I have been known to be wrong on these things.

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Friday, March 17, 2017 2:40 PM
To: SQLite mailing list
Subject: Re: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...


On 17 Mar 2017, at 6:22pm, Rob Willett <[hidden email]> wrote:

> 4. Work through returning just the columns we actually need from our queries. We have a recollection that if we can build an index with all the information necessary in it, we can do all the work in joins rather than paging out to disk. Is this what you are referring to?

It works only where all the columns you need to read are in the same table.  The ideal form of a covering index is to have the columns listed in this order:

1) columns needed for the WHERE clause
2) columns needed for the ORDER BY clause which aren’t in (1)
3) columns needed to be read which aren’t in (2) or (1)

SQLite detects that all the information it needs for the SELECT is available from the index, so it doesn’t bother to read the table at all.  This can lead to something like a doubling of speed.  Of course, you sacrifice filespace, and making changes to the table takes a little longer.

> 5. Sleep (not exactly sure when) and watch three international rugby games tomorrow.

Sleep while waiting for indexes to be created and ANALYZE to work.  May you see skilled players, creative moves and dramatic play.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
In reply to this post by Simon Slavin-3
Dear all,

We progress steadily forward.

1. We immediately halved our execution time by moving to an updated
index that doesn't use COLLATE. Thats 32mins to 16mins.

2. We then shaved a further six minutes off the execution time by
removing extraneous fields in the select statement, so instead of
"select * ...", we identified which fields we used and directly selected
those. So we are now down to 10 mins or 1/3 of when we started for, to
be honest, virtually no extra work, merely being smarter, or rather you
being smarter.

3. We have looked through all our indexes and can see that every index
has a COLLATE against it, even if the column is an integer. We have
raised a support call with Navicat.

4. The next step is to create a "covering index" to try and get the
whole of the query into the index. However its 22:11 in London and I
need to get home.

Thanks very much for the help so far. Tomorrow is more tricky but I'll
read up on covering indexes to see how to use them,.

Rob

On 17 Mar 2017, at 18:39, Simon Slavin wrote:

> On 17 Mar 2017, at 6:22pm, Rob Willett <[hidden email]>
> wrote:
>
>> 4. Work through returning just the columns we actually need from our
>> queries. We have a recollection that if we can build an index with
>> all the information necessary in it, we can do all the work in joins
>> rather than paging out to disk. Is this what you are referring to?
>
> It works only where all the columns you need to read are in the same
> table.  The ideal form of a covering index is to have the columns
> listed in this order:
>
> 1) columns needed for the WHERE clause
> 2) columns needed for the ORDER BY clause which aren’t in (1)
> 3) columns needed to be read which aren’t in (2) or (1)
>
> SQLite detects that all the information it needs for the SELECT is
> available from the index, so it doesn’t bother to read the table at
> all.  This can lead to something like a doubling of speed.  Of course,
> you sacrifice filespace, and making changes to the table takes a
> little longer.
>
>> 5. Sleep (not exactly sure when) and watch three international rugby
>> games tomorrow.
>
> Sleep while waiting for indexes to be created and ANALYZE to work.  
> May you see skilled players, creative moves and dramatic play.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Keith Medcalf
In reply to this post by Rob Willett

Change your query to

explain query plan select * from Disruptions where status = 2 collate nocase OR status = 6 collate nocase;

to match your index.

> -----Original Message-----
> From: sqlite-users [mailto:[hidden email]]
> On Behalf Of Rob Willett
> Sent: Friday, 17 March, 2017 04:20
> To: SQLite mailing list
> Subject: [sqlite] Yet Another Why Doesn't Sqlite Use My Index question ...
>
> Hi,
>
> We've been struggling with a complex query that we have written. One of
> the elements of this complex query is a select statement that doesn't
> appear to use an index when we *think* it should do. We're not going to
> ask you to debug a large complex SQL query (unless you have nothing else
> to do today) but we're working our way through the query trying to
> understand where we've gone wrong. So we've broken down the query and
> are trying each section to see what it performs (or rather doesn't
> perform like).
>
> The sub query is a simple select on a large table, Disruptions. The
> Disruptions table has 180M rows of data.
>
> The schema for it is here. We've pulled it straight from Navicat for
> SQLite.
>
> ```
> CREATE TABLE "Disruptions" (
> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> "version" integer NOT NULL,
> "Disruption_id" INTEGER NOT NULL,
> "status" integer NOT NULL,
> "severity" integer NOT NULL,
> "levelOfInterest" integer NOT NULL,
> "category" integer NOT NULL,
> "subCategory" integer NOT NULL,
> "startTime" TEXT NOT NULL,
> "endTime" text NOT NULL,
> "location" integer NOT NULL,
> "corridor" integer NOT NULL,
> "comments" integer NOT NULL,
> "currentUpdate" integer NOT NULL,
> "remarkTime" TEXT NOT NULL,
> "lastModTime" TEXT NOT NULL,
> "CauseAreaPointX" real NOT NULL,
> "CauseAreaPointY" real NOT NULL,
> "Direction" TEXT
> );
> INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("Disruptions",
> '184626834');
>
> -- ----------------------------
> --  Indexes structure for table Disruptions
> -- ----------------------------
> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC,
> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE
> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
> NOCASE ASC);
> CREATE INDEX "Disruptions_idx4" ON Disruptions ("status" COLLATE NOCASE
> ASC);
>
> PRAGMA foreign_keys = true;
> ```
>
> As part of the larger more complex query, we are executing the query
>
> ```
> select * from Disruptions where status = 2 OR status = 6;
> ```
>
> Status is one of six values, 1 to 6 and is probably not evenly
> distributed across the 180M rows.
>
> If we do
>
> ```
> sqlite> explain query plan select * from Disruptions where status = 2 OR
> status = 6;
> selectid|order|from|detail
> 0|0|0|SCAN TABLE Disruptions
> ```
>
> We can see that table scanning a 180M records is going to be slow, no
> matter what the rest of the complex query is like.
>
> We have an index Disruptions_idx4 which we *think* should speed it up,
> but the query plan doesn't seem to take this into account.
>
> We think that only having six values of Status means that the speed up
> from the index is not going to be fantastic but every little helps.
>
> We have run analyze on the database and that hasn't helped.
>
> Our gut feeling at this moment is that we have the design structure
> wrong in our DB and we're going to have to take a long hard look at what
> we're doing, do a redesign and a rebuild as we simply got it wrong from
> the beginning. Hindsight is wonderful ;) In the interim (as this is a
> 2-3 month job), we need to speed our query up from 90 mins down to
> something in the tens of mins.
>
> Any suggestions very much welcomed,
>
> Thanks
>
> Rob
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Keith Medcalf
In reply to this post by Rob Willett
On Friday, 17 March, 2017 16:13, Rob Willett <[hidden email]> said:

> 3. We have looked through all our indexes and can see that every index
> has a COLLATE against it, even if the column is an integer. We have
> raised a support call with Navicat.

If you want a table field to be non-case-sensitive, the proper place to declare that field as such is when you define the table.

create table Words (Word text unique collate nocase);

means that everywhere that you use Wowrds.Word is not case sensitive (but it is case preserving).  End of Story.

create table Words (Word text);
create unique index sqlite_autoindex_Words_1 on Words (Word collate nocase);

is however an entirely different thing.  The data in Words.Word is case sensitive, and the index is not case sensitive.  Thus query will operate differently depending on how you declared your table and index:

select * from Words where word = 'Apple';

if the former (nocase collation on the table) will (a) use the index and (b) be equivalent to where lower(word) = lower('Apple') (that is, case insensitive)

in the case where you have declared a collate nocase index, the index will NOT BE USED unless you specify COLLATE NOCASE in the where clause.  By default the search will be a table scan that is case sensitive.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Simon Slavin-3
In reply to this post by Rob Willett

On 17 Mar 2017, at 10:12pm, Rob Willett <[hidden email]> wrote:

> 3. We have looked through all our indexes and can see that every index has a COLLATE against it, even if the column is an integer. We have raised a support call with Navicat.

This might also be something that Navicat should look into.  The most efficient way to implement COLLATE in SQLite is to do it in the column definition.  At the moment, for example, the table is defined as

> CREATE TABLE "Disruptions" (
> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> […]
> "Direction" TEXT
> );

.  If instead it said

CREATE TABLE "Disruptions" (
         "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
         […]
         "Direction" TEXT COLLATE NOCASE
);

then all comparisons and indexes which mentioned "Direction" would automatically use NOCASE without having to have COLLATE NOCASE mentioned in them.  This is almost always The Right Thing, and what the programmer would want.  It simplifies all the other SQL commands used by the program.  And since the collation on the index then matches the collation for the column definition, SQLite has to do less work every time it deals with the index.

Unfortunately, this change cannot be made with backward compatibility.  Changing the table definition would involve remaking the table, probably by defining a new table with the new definition, copying the data across, deleting the original table, and renaming the new one.  If backward compatibility is important in support of the program then this may be a deal-breaker and one could understand why the developer team won’t make the change.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Rob Willett
In reply to this post by Rob Willett
We've just implemented a covering index for the last step (is it
really?) in our quest to get the query execution time down.

To summarise we have gone from 32 mins to 16 mins by updating an index
so it doesn't use collate, we took another six minutes off by removing
extra fields in the select we didn't need.

We have just created a new index which 'covers' all the fields we use in
the select, this means (and I paraphrase) that we use the index to get
all the data and there is no need to read from the database.

Well that was a bit of a surprise, the index creation took 45 mins, we
ran the program again and thought, rats, we've cocked it up, it only
took 54 secs, we got something wrong. So we checked it and checked again
and we hasn't got anything wrong. Our query has moved from 32 mins to 54
secs.

We're quite happy with that performance increase. In fact we're
delighted, so thanks for all the help in getting us to this stage.

We have kept copies of the query planner bytecode output if anybody is
interested. Gunter has had copies, but if anybody else would like them,
please ask.

Many thanks again for all the help,

Rob

On 17 Mar 2017, at 22:12, Rob Willett wrote:

> Dear all,
>
> We progress steadily forward.
>
> 1. We immediately halved our execution time by moving to an updated
> index that doesn't use COLLATE. Thats 32mins to 16mins.
>
> 2. We then shaved a further six minutes off the execution time by
> removing extraneous fields in the select statement, so instead of
> "select * ...", we identified which fields we used and directly
> selected those. So we are now down to 10 mins or 1/3 of when we
> started for, to be honest, virtually no extra work, merely being
> smarter, or rather you being smarter.
>
> 3. We have looked through all our indexes and can see that every index
> has a COLLATE against it, even if the column is an integer. We have
> raised a support call with Navicat.
>
> 4. The next step is to create a "covering index" to try and get the
> whole of the query into the index. However its 22:11 in London and I
> need to get home.
>
> Thanks very much for the help so far. Tomorrow is more tricky but I'll
> read up on covering indexes to see how to use them,.
>
> Rob
>
> On 17 Mar 2017, at 18:39, Simon Slavin wrote:
>
>> On 17 Mar 2017, at 6:22pm, Rob Willett <[hidden email]>
>> wrote:
>>
>>> 4. Work through returning just the columns we actually need from our
>>> queries. We have a recollection that if we can build an index with
>>> all the information necessary in it, we can do all the work in joins
>>> rather than paging out to disk. Is this what you are referring to?
>>
>> It works only where all the columns you need to read are in the same
>> table.  The ideal form of a covering index is to have the columns
>> listed in this order:
>>
>> 1) columns needed for the WHERE clause
>> 2) columns needed for the ORDER BY clause which aren’t in (1)
>> 3) columns needed to be read which aren’t in (2) or (1)
>>
>> SQLite detects that all the information it needs for the SELECT is
>> available from the index, so it doesn’t bother to read the table at
>> all.  This can lead to something like a doubling of speed.  Of
>> course, you sacrifice filespace, and making changes to the table
>> takes a little longer.
>>
>>> 5. Sleep (not exactly sure when) and watch three international rugby
>>> games tomorrow.
>>
>> Sleep while waiting for indexes to be created and ANALYZE to work.  
>> May you see skilled players, creative moves and dramatic play.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Yet Another Why Doesn't Sqlite Use My Index question ...

Paul Sanderson
What is the average size of the text in the direction field? and what page
size have you set for the database? If the size of a record is such that
only a small handful fit into a page, or worse each record overflows (and
your select includes the direction field) then this could impact
performance.

Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence

On 18 March 2017 at 05:48, Rob Willett <[hidden email]> wrote:

> We've just implemented a covering index for the last step (is it really?)
> in our quest to get the query execution time down.
>
> To summarise we have gone from 32 mins to 16 mins by updating an index so
> it doesn't use collate, we took another six minutes off by removing extra
> fields in the select we didn't need.
>
> We have just created a new index which 'covers' all the fields we use in
> the select, this means (and I paraphrase) that we use the index to get all
> the data and there is no need to read from the database.
>
> Well that was a bit of a surprise, the index creation took 45 mins, we ran
> the program again and thought, rats, we've cocked it up, it only took 54
> secs, we got something wrong. So we checked it and checked again and we
> hasn't got anything wrong. Our query has moved from 32 mins to 54 secs.
>
> We're quite happy with that performance increase. In fact we're delighted,
> so thanks for all the help in getting us to this stage.
>
> We have kept copies of the query planner bytecode output if anybody is
> interested. Gunter has had copies, but if anybody else would like them,
> please ask.
>
> Many thanks again for all the help,
>
> Rob
>
>
> On 17 Mar 2017, at 22:12, Rob Willett wrote:
>
> Dear all,
>>
>> We progress steadily forward.
>>
>> 1. We immediately halved our execution time by moving to an updated index
>> that doesn't use COLLATE. Thats 32mins to 16mins.
>>
>> 2. We then shaved a further six minutes off the execution time by
>> removing extraneous fields in the select statement, so instead of "select *
>> ...", we identified which fields we used and directly selected those. So we
>> are now down to 10 mins or 1/3 of when we started for, to be honest,
>> virtually no extra work, merely being smarter, or rather you being smarter.
>>
>> 3. We have looked through all our indexes and can see that every index
>> has a COLLATE against it, even if the column is an integer. We have raised
>> a support call with Navicat.
>>
>> 4. The next step is to create a "covering index" to try and get the whole
>> of the query into the index. However its 22:11 in London and I need to get
>> home.
>>
>> Thanks very much for the help so far. Tomorrow is more tricky but I'll
>> read up on covering indexes to see how to use them,.
>>
>> Rob
>>
>> On 17 Mar 2017, at 18:39, Simon Slavin wrote:
>>
>> On 17 Mar 2017, at 6:22pm, Rob Willett <[hidden email]>
>>> wrote:
>>>
>>> 4. Work through returning just the columns we actually need from our
>>>> queries. We have a recollection that if we can build an index with all the
>>>> information necessary in it, we can do all the work in joins rather than
>>>> paging out to disk. Is this what you are referring to?
>>>>
>>>
>>> It works only where all the columns you need to read are in the same
>>> table.  The ideal form of a covering index is to have the columns listed in
>>> this order:
>>>
>>> 1) columns needed for the WHERE clause
>>> 2) columns needed for the ORDER BY clause which aren’t in (1)
>>> 3) columns needed to be read which aren’t in (2) or (1)
>>>
>>> SQLite detects that all the information it needs for the SELECT is
>>> available from the index, so it doesn’t bother to read the table at all.
>>> This can lead to something like a doubling of speed.  Of course, you
>>> sacrifice filespace, and making changes to the table takes a little longer.
>>>
>>> 5. Sleep (not exactly sure when) and watch three international rugby
>>>> games tomorrow.
>>>>
>>>
>>> Sleep while waiting for indexes to be created and ANALYZE to work.  May
>>> you see skilled players, creative moves and dramatic play.
>>>
>>> Simon.
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12