Query optimisation

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Query optimisation

Dave Wellman
HI all,

 

I would like to use the following example as a learning exercise for myself
to check my understanding of part of sqlite processing.

 

I have the following query which functionally works fine, and to be upfront
about it the volume of data is so small that performance is not an issue.

 

The query:

update relation

   set wastatsidcount = (select src.wastatsidcount

                              from waSTATSINFO_VT as src

                              where src.relationkey = relation.relationkey)

where wastatsidcount is null

  and queryid = 2;

 

The plan:

If I run 'explain query plan' on this command it gives the following:

SelectedID          Order    From     Detail

0                              0              0              SCAN TABLE
relation

0                              0              0              EXECUTE
CORRELATED SCALAR SUBQUERY 0

0                              0              0              SEARCH TABLE
waSTATSINFO_VT AS src USING INTEGER PRIMARY KEY (rowid=?)

 

My 'src' table is defined with a PRIMARY KEY on column RELATIONKEY.

The same column is also the primary key on the 'relation' table.

 

My understanding of this plan is:

-          Read the relation table using a full table scan

-          (Assumption) Any row in the relation table that does NOT meet the
WHERE clause is ignored.

-          For each qualifying row in the relation table read from
waSTATSINFO_VT using the PK index to try and find a match

-          Where there is a match, update the relation table.

 

Questions:

-          Is my assumption above ("Any row in the relation table that does
NOT meet the WHERE clause is ignored") correct?

-          Is there any form of 'explain' or other diagnostic output which
would show me this?
I tried the 'explain query plan' with and without the full WHERE clause and
it didn't change the output (I didn't understand the output from the plain
'explain' command!)

 

In this particular example I need the WHERE clause as coded in order to give
me the correct answer, but as I said at the start I'm trying to deepen my
knowledge of SQLite performance and it's optimiser.

 

I'm currently using v3.20.1.

 

Cheers,

Dave

 

 

Ward Analytics Ltd - information in motion

Tel: +44 (0) 118 9740191

Fax: +44 (0) 118 9740192

www: http://www.ward-analytics.com

 

Registered office address: The Oriel, Sydenham Road, Guildford, Surrey,
United Kingdom, GU1 3SR

Registered company number: 3917021 Registered in England and Wales.

 

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query optimisation

David Raymond
Running just "explain some query" will give you the virtual machine program that it plans on using. You can then scan through that to see what it's doing. Note that the descriptions on the below page for those op codes are sometimes really confusing and it can take a while to decypher what's going on.

https://www.sqlite.org/opcode.html


"(Assumption) Any row in the relation table that does NOT meet the WHERE clause is ignored."

Down below, lines 5 and 6 (and 24) are in the outer loop and basically say "if waStatsIDCount is not null then go to the next record", so correct.


"For each qualifying row in the relation table read from waSTATSINFO_VT using the PK index to try and find a match"

Line 14 shows that it's using waStatsInfo_VT (the main rowid/integer primary key table) for that part of the program. and Line 17 has it trying to find a matching rowid


"Where there is a match, update the relation table."

Even if there isn't a match. In that case it would get updated with null. All of the "whether or not to actually do the update" bits have already been looked at.


SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> create table relation (relationKey integer primary key, waStatsIDCount, queryID);

sqlite> create table waStatsInfo_VT (relationKey integer primary key, waStatsIDCount);

sqlite> explain query plan update relation set waStatsIDCount = (select src.waStatsIDCount from waStatsInfo_VT as src where src.relationKey = relation.relationKey) where waStatsIDCount is null and queryID = 2;
selectid|order|from|detail
0|0|0|SCAN TABLE relation
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 0
0|0|0|SEARCH TABLE waStatsInfo_VT AS src USING INTEGER PRIMARY KEY (rowid=?)

sqlite> explain update relation set waStatsIDCount = (select src.waStatsIDCount from waStatsInfo_VT as src where src.relationKey = relation.relationKey) where waStatsIDCount is null and queryID = 2;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     26    0                    00  Start at 26
1     Null           0     1     2                    00  r[1..2]=NULL
2     OpenWrite      0     2     0     3              00  root=2 iDb=0; relation
3     Explain        0     0     0     SCAN TABLE relation  00
4     Rewind         0     25    0                    00
5       Column         0     1     6                    00  r[6]=relation.waStatsIDCount
6       NotNull        6     24    0                    00  if r[6]!=NULL goto 24
7       Column         0     2     7                    00  r[7]=relation.queryID
8       Ne             8     24    7     (BINARY)       51  if r[7]!=r[8] goto 24
9       Rowid          0     2     0                    00  r[2]=rowid
10      IsNull         2     25    0                    00  if r[2]==NULL goto 25
11      Null           0     3     0                    00  r[3]=NULL
12      Null           0     9     9                    00  r[9..9]=NULL; Init subquery result
13      Integer        1     10    0                    00  r[10]=1; LIMIT counter
14      OpenRead       1     3     0     2              00  root=3 iDb=0; waStatsInfo_VT
15      Explain        0     0     0     SEARCH TABLE waStatsInfo_VT AS src USING INTEGER PRIMARY KEY (rowid=?)  00
16      Rowid          0     11    0                    00  r[11]=rowid
17      SeekRowid      1     20    11                   00  intkey=r[11]; pk
18      Column         1     1     9                    00  r[9]=waStatsInfo_VT.waStatsIDCount
19      DecrJumpZero   10    20    0                    00  if (--r[10])==0 goto 20
20      SCopy          9     4     0                    00  r[4]=r[9]
21      Column         0     2     5                    00  r[5]=relation.queryID
22      MakeRecord     3     3     7     D              00  r[7]=mkrec(r[3..5])
23      Insert         0     7     2     relation       07  intkey=r[2] data=r[7]
24    Next           0     5     0                    01
25    Halt           0     0     0                    00
26    Transaction    0     1     2     0              01  usesStmtJournal=0
27    Integer        2     8     0                    00  r[8]=2
28    Goto           0     1     0                    00

sqlite>



-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of David Wellman
Sent: Friday, August 24, 2018 7:47 AM
To: SQLite Users
Subject: [sqlite] Query optimisation

HI all,

 

I would like to use the following example as a learning exercise for myself
to check my understanding of part of sqlite processing.

 

I have the following query which functionally works fine, and to be upfront
about it the volume of data is so small that performance is not an issue.

 

The query:

update relation

   set wastatsidcount = (select src.wastatsidcount

                              from waSTATSINFO_VT as src

                              where src.relationkey = relation.relationkey)

where wastatsidcount is null

  and queryid = 2;

 

The plan:

If I run 'explain query plan' on this command it gives the following:

SelectedID          Order    From     Detail

0                              0              0              SCAN TABLE
relation

0                              0              0              EXECUTE
CORRELATED SCALAR SUBQUERY 0

0                              0              0              SEARCH TABLE
waSTATSINFO_VT AS src USING INTEGER PRIMARY KEY (rowid=?)

 

My 'src' table is defined with a PRIMARY KEY on column RELATIONKEY.

The same column is also the primary key on the 'relation' table.

 

My understanding of this plan is:

-          Read the relation table using a full table scan

-          (Assumption) Any row in the relation table that does NOT meet the
WHERE clause is ignored.

-          For each qualifying row in the relation table read from
waSTATSINFO_VT using the PK index to try and find a match

-          Where there is a match, update the relation table.

 

Questions:

-          Is my assumption above ("Any row in the relation table that does
NOT meet the WHERE clause is ignored") correct?

-          Is there any form of 'explain' or other diagnostic output which
would show me this?
I tried the 'explain query plan' with and without the full WHERE clause and
it didn't change the output (I didn't understand the output from the plain
'explain' command!)

 

In this particular example I need the WHERE clause as coded in order to give
me the correct answer, but as I said at the start I'm trying to deepen my
knowledge of SQLite performance and it's optimiser.

 

I'm currently using v3.20.1.

 

Cheers,

Dave

 

 

Ward Analytics Ltd - information in motion

Tel: +44 (0) 118 9740191

Fax: +44 (0) 118 9740192

www: http://www.ward-analytics.com

 

Registered office address: The Oriel, Sydenham Road, Guildford, Surrey,
United Kingdom, GU1 3SR

Registered company number: 3917021 Registered in England and Wales.

 

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query optimisation

Richard Hipp-3
On 8/24/18, David Raymond <[hidden email]> wrote:
> Running just "explain some query" will give you the virtual machine program
> that it plans on using. You can then scan through that to see what it's
> doing. Note that the descriptions on the below page for those op codes are
> sometimes really confusing and it can take a while to decypher what's going
> on.
>
> https://www.sqlite.org/opcode.html

To further confuse matters, the https://www.sqlite.org/opcode.html
page only describes the opcodes for the latest release (3.24.0)
whereas the OP is using an earlier release (3.20, I think).  Opcodes
and their meanings can change from one release to the next.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query optimisation

Dave Wellman
Hi Richard and David,

Many thanks for your responses, very useful.

I'll work with that (being careful to look at the OpCode documentation for my release of sqlite!) and see where I get to.

Cheers,
Dave


Ward Analytics Ltd - information in motion
Tel: +44 (0) 118 9740191
Fax: +44 (0) 118 9740192
www: http://www.ward-analytics.com

Registered office address: The Oriel, Sydenham Road, Guildford, Surrey, United Kingdom, GU1 3SR
Registered company number: 3917021 Registered in England and Wales.

-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Richard Hipp
Sent: 24 August 2018 15:43
To: SQLite mailing list
Subject: Re: [sqlite] Query optimisation

On 8/24/18, David Raymond <[hidden email]> wrote:
> Running just "explain some query" will give you the virtual machine program
> that it plans on using. You can then scan through that to see what it's
> doing. Note that the descriptions on the below page for those op codes are
> sometimes really confusing and it can take a while to decypher what's going
> on.
>
> https://www.sqlite.org/opcode.html

To further confuse matters, the https://www.sqlite.org/opcode.html
page only describes the opcodes for the latest release (3.24.0)
whereas the OP is using an earlier release (3.20, I think).  Opcodes
and their meanings can change from one release to the next.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

keys

ward
In reply to this post by Dave Wellman
"The parent key of a foreign key constraint is not allowed to use the
rowid. The parent key must used named columns only."

Why is this?
Which would be more efficient?
1) WITHOUT ROWID and column of INTEGER PRIMARY KEY
or
2) an aliased rowid.

Background: The data is sparse, incomplete, and fuzzy. Two records are
known to have a relationship
even though the content that identifies this relation is not known.


Lyle Ward
Genealogist

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: keys

J Decker
On Fri, Aug 24, 2018 at 4:31 PM <[hidden email]> wrote:

> "The parent key of a foreign key constraint is not allowed to use the
> rowid. The parent key must used named columns only."
>
> Why is this?
> Which would be more efficient?
> 1) WITHOUT ROWID and column of INTEGER PRIMARY KEY
> or
> 2) an aliased rowid.
>
I expect because rowid is subject to change....
insert 3 records, delete the first, and 2, 3 become 1 and 2 ... which would
have to cascade update ALL other FK refs...


>
> Background: The data is sparse, incomplete, and fuzzy. Two records are
> known to have a relationship
> even though the content that identifies this relation is not known.
>
>
> Lyle Ward
> Genealogist
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: keys

ward
got it. Thanks.

On 2018-08-24 19:36, J Decker wrote:

> On Fri, Aug 24, 2018 at 4:31 PM <[hidden email]> wrote:
>
>> "The parent key of a foreign key constraint is not allowed to use the
>> rowid. The parent key must used named columns only."
>>
>> Why is this?
>> Which would be more efficient?
>> 1) WITHOUT ROWID and column of INTEGER PRIMARY KEY
>> or
>> 2) an aliased rowid.
>>
> I expect because rowid is subject to change....
> insert 3 records, delete the first, and 2, 3 become 1 and 2 ... which
> would
> have to cascade update ALL other FK refs...
>
>
>>
>> Background: The data is sparse, incomplete, and fuzzy. Two records are
>> known to have a relationship
>> even though the content that identifies this relation is not known.
>>
>>
>> Lyle Ward
>> Genealogist
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: keys

Simon Slavin-3
In reply to this post by ward
On 25 Aug 2018, at 12:31am, [hidden email] wrote:

> "The parent key of a foreign key constraint is not allowed to use the rowid. The parent key must used named columns only."
>
> Why is this?

Because it's not named.  In theory you could later add a column named 'rowid' to mean a BLOB column.

> Which would be more efficient?
> 1) WITHOUT ROWID and column of INTEGER PRIMARY KEY
> or
> 2) an aliased rowid.

I think it would be (2).  Actually define the column 'rowid' as INTEGER PRIMARY KEY.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: keys

Keith Medcalf
In reply to this post by ward
On Friday, 24 August, 2018 17:31, [hidden email] wrote:

>"The parent key of a foreign key constraint is not allowed to use the
>rowid. The parent key must used named columns only."

>Why is this?

You should think of this as:

>"The parent key of a foreign key constraint is not allowed to use the
>rowid. The parent key must use named columns only, which may include
>an explicitly named rowid (INTEGER PRIMARY KEY) column."

This is because unless the rowid column (the INTEGER PRIMARY KEY) of a standard table is explicitly named, it's value will not be dumped by the dump command, and may be changed by running operations such as VACUUM on the database (since it is not a named data item of the row -- it is merely an internal ephemeral identifier not associated with the rest of the row -- it is not a candidate key for the row).  If the INTEGER PRIMARY KEY is explicitly named then it is a "value" in the row and will be dumped by the dump command, and will not change when a vacuum of the database is done since it is a part of the row data (as a named column it is a valid candidate key for the row).

Otherwise it is still the "record number" of the row within the table and the behaviour associated with that status is not changed (that is, its value cannot be null and if you attempt to insert a record with a null INTEGER PRIMARY KEY (record number) the value will be set to one greater that the maximum record number in use at the time of insert (or with AUTOINCREMENT, one greater than any record number ever inserted in the table).

Also, you (or someone else) may in the future add a column to the table called "rowid" which would mask the original "rowid" thus destroying referential integrity since unless explicitly given a name references to "rowid" are magical.

>Which would be more efficient?
>1) WITHOUT ROWID and column of INTEGER PRIMARY KEY
>or
>2) an aliased rowid.

No idea since the underlying structures are different between ROWID and WITHOUT ROWID tables.  

In a ROWID table the INTEGER PRIMARY KEY (the implicit record number) identifies the "row" in the storage B-Tree directly -- you may still have other keys (PRIMARY or UNIQUE) which are merely additional indexes on the table data columns with the RowID being the payload of the index; however, the "record number" INTEGER PRIMARY KEY always exists as the primary identifier of the row.

WITHOUT ROWID tables are conceptually just index B-Tree's where the declared PRIMARY KEY is the record key in the index and the other columns are the payload.

The primary difference that you would notice is that in a WITHOUT ROWID table you would have to explicitly give each row a unique rowid value rather than allow the database engine to create one for you when you insert data.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users