Unexpected REINDEX behavior.

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected REINDEX behavior.

dirdi
Hi there, I noticed some - at least for me - unexpected behavior of the
REINDEX command:

I have a DB with 7 tables, 11 indexes and a size of about 140MiB. The DB
is generated by a python script parsing an XML file. Entries are added
to the tables in arbitrary order.

After the tables have been created and filled, some UPDATE queries are
executed. One of those queries took ~36m. I tried to reduce the runtime
by using the REINDEX command, but with no success. However, if I delete
and recreate one of the indexes, using the DROP INDEX and CREATE INDEX
commands, the UPDATE query takes only ~2m.

Unfortunately I can not share the DB with you, due to legal reasons and
was also not able to create a test script to trigger this behavior.

The two tables, affected by the UPDATE query:

> CREATE TABLE `tbl1` (
>     `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>     `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>     `int` INTEGER NOT NULL,
>     `tbl2_id` INTEGER,
>     `other_str` TEXT DEFAULT NULL,
>     `other_int1` INTEGER DEFAULT NULL,
>     `other_int2` INTEGER NOT NULL DEFAULT 0,
>     FOREIGN KEY(`tbl2_id`) REFERENCES `tbl2`(`id`)
> );
>
> CREATE TABLE `tbl2` (
>     `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>     `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>     `int` INTEGER NOT NULL
> );

Their corresponding indexes:
> CREATE INDEX `idx1` ON `tbl1` (
>     `int`
> );
> CREATE INDEX `idx2` ON `tbl1` (
>     `tbl2_id`
> );
> CREATE INDEX `idx3` ON `tbl2` (
>     `int`
> );
... filling the tables ...

Recreating index idx1 (To archive the speedup, recreation of the other
indexes is not necessary):
> DROP INDEX `idx1`;
> CREATE INDEX `idx1` ON `tbl1` (
>     `int`
> );
And the update query:
> UPDATE `tbl1`
> SET `tbl2_id` = (
>     SELECT `tbl2`.`id`
>     FROM `tbl2`
>     WHERE `tbl1`.`int` >= `tbl2`.`int`
>         AND `tbl1`.`str` LIKE '%' || `tbl2`.`str`
>     ORDER BY `tbl2`.`int` DESC
>     LIMIT 1
> );

And it's corresponding query plan:
> id    parent  notused detail> 3 0 0 SCAN TABLE tbl1
> 23 0 0 CORRELATED SCALAR SUBQUERY 1
> 29 23 0 SEARCH TABLE tbl2 USING INDEX idx3 (int<?)

tbl1 and tbl2 contain about 600k and 20k entries, respectively.

Does one have an explanation for this behavior?
Is it expected that the REINDEX command produces other results than
"manually" recreation of an index?
If yes, under which circumstances does this happen?
And is there some way to measure the "quality" of an index / if it has
been scattered?

PS: I also tried the VACUUM (combined w/ and w/o REINDEX) command to no
success.

--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

Simon Slavin-3
On 29 Aug 2019, at 3:09pm, dirdi <[hidden email]> wrote:

> Does one have an explanation for this behavior?

> Is it expected that the REINDEX command produces other results than "manually" recreation of an index?
> If yes, under which circumstances does this happen?
> And is there some way to measure the "quality" of an index / if it has been scattered?

Just in case, please run an integrity-check on the database as it is before you delete/rebuild the index.

What you're seeing seems weird to me.  One thing that comes to mind is that deleting and recreating an index might put all the pages of the new index at the end of the database file, whereas the REINDEX command might reuse the pages which the index is currently using.  But I don't know whether this is how SQLite really works.

You might be able to learn more about the various versions of that index using this:

<https://sqlite.org/sqlanalyze.html>

It may show that the various versions of idx1 you see/create take different numbers of pages, or have different depths.  However, I agree with you that the amount of time disparity is unexpected.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

Keith Medcalf
In reply to this post by dirdi

Do you have sqlite_stat1 or sqlite_stat4 tables present (that is, have you ever run analyze)?  

Neither running a REINDEX nor a VACUUM will modify the statistics, however, dropping the index will delete the index statistics for the index (and re-creating the index will not re-create the statistics).

If the statistics tables (sqlite_stat1 and/or sqlite_stat4) are present then you update them by running analyze;  You can also drop the sqlite_stat1 and/or sqlite_stat4 tables to get rid of the statistics entirely, or delete individual rows from the table to get rid of stale shape data.

If the sqlite_stat1 or sqlite_stat4 tables are present, then if you run ANALYZE; rather than reindexing or dropping/recreating the index, what is the result?

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of dirdi
>Sent: Thursday, 29 August, 2019 08:10
>To: [hidden email]
>Subject: [sqlite] Unexpected REINDEX behavior.
>
>Hi there, I noticed some - at least for me - unexpected behavior of
>the
>REINDEX command:
>
>I have a DB with 7 tables, 11 indexes and a size of about 140MiB. The
>DB
>is generated by a python script parsing an XML file. Entries are
>added
>to the tables in arbitrary order.
>
>After the tables have been created and filled, some UPDATE queries
>are
>executed. One of those queries took ~36m. I tried to reduce the
>runtime
>by using the REINDEX command, but with no success. However, if I
>delete
>and recreate one of the indexes, using the DROP INDEX and CREATE
>INDEX
>commands, the UPDATE query takes only ~2m.
>
>Unfortunately I can not share the DB with you, due to legal reasons
>and
>was also not able to create a test script to trigger this behavior.
>
>The two tables, affected by the UPDATE query:
>> CREATE TABLE `tbl1` (
>>     `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>>     `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>>     `int` INTEGER NOT NULL,
>>     `tbl2_id` INTEGER,
>>     `other_str` TEXT DEFAULT NULL,
>>     `other_int1` INTEGER DEFAULT NULL,
>>     `other_int2` INTEGER NOT NULL DEFAULT 0,
>>     FOREIGN KEY(`tbl2_id`) REFERENCES `tbl2`(`id`)
>> );
>>
>> CREATE TABLE `tbl2` (
>>     `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>>     `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>>     `int` INTEGER NOT NULL
>> );
>
>Their corresponding indexes:
>> CREATE INDEX `idx1` ON `tbl1` (
>>     `int`
>> );
>> CREATE INDEX `idx2` ON `tbl1` (
>>     `tbl2_id`
>> );
>> CREATE INDEX `idx3` ON `tbl2` (
>>     `int`
>> );
>... filling the tables ...
>
>Recreating index idx1 (To archive the speedup, recreation of the
>other
>indexes is not necessary):
>> DROP INDEX `idx1`;
>> CREATE INDEX `idx1` ON `tbl1` (
>>     `int`
>> );
>And the update query:
>> UPDATE `tbl1`
>> SET `tbl2_id` = (
>>     SELECT `tbl2`.`id`
>>     FROM `tbl2`
>>     WHERE `tbl1`.`int` >= `tbl2`.`int`
>>         AND `tbl1`.`str` LIKE '%' || `tbl2`.`str`
>>     ORDER BY `tbl2`.`int` DESC
>>     LIMIT 1
>> );
>
>And it's corresponding query plan:
>> id    parent  notused detail> 3 0 0 SCAN TABLE tbl1
>> 23 0 0 CORRELATED SCALAR SUBQUERY 1
>> 29 23 0 SEARCH TABLE tbl2 USING INDEX idx3 (int<?)
>
>tbl1 and tbl2 contain about 600k and 20k entries, respectively.
>
>Does one have an explanation for this behavior?
>Is it expected that the REINDEX command produces other results than
>"manually" recreation of an index?
>If yes, under which circumstances does this happen?
>And is there some way to measure the "quality" of an index / if it
>has
>been scattered?
>
>PS: I also tried the VACUUM (combined w/ and w/o REINDEX) command to
>no
>success.
>
>--
>Best regards
>
>dirdi
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

dirdi
On 8/29/19 6:10 PM, Keith Medcalf wrote:
> Do you have sqlite_stat1 or sqlite_stat4 tables present (that is, have you ever run analyze)?  

No.

> If the sqlite_stat1 or sqlite_stat4 tables are present, then if you run ANALYZE; rather than reindexing or dropping/recreating the index, what is the result?

If I run ...

> ANALYZE;
> REINDEX;
> ANALYZE;

... instead of ...

> DROP INDEX `idx1`;
> CREATE INDEX `idx1` ON `tbl1` (
>     `int`
> );

the UPDATE query remains being slow (~36m).

--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

Richard Damon
On 8/29/19 10:56 AM, dirdi wrote:

> On 8/29/19 6:10 PM, Keith Medcalf wrote:
>> Do you have sqlite_stat1 or sqlite_stat4 tables present (that is, have you ever run analyze)?  
> No.
>
>> If the sqlite_stat1 or sqlite_stat4 tables are present, then if you run ANALYZE; rather than reindexing or dropping/recreating the index, what is the result?
> If I run ...
>
>> ANALYZE;
>> REINDEX;
>> ANALYZE;
> ... instead of ...
>
>> DROP INDEX `idx1`;
>> CREATE INDEX `idx1` ON `tbl1` (
>>     `int`
>> );
> the UPDATE query remains being slow (~36m).
>
One thought, could there perhaps be something about the query that the
ANALYZE gives bad information about how the structure of data in the
table that dropping and recreating (which deletes the results of the
last ANALYZE) makes the query planner chose a different plan which works
better. Perhaps you could look at and compare the query plans of the two
different scenarios.

--
Richard Damon

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

dirdi
In reply to this post by Simon Slavin-3
On 8/29/19 6:00 PM, Simon Slavin wrote:
> Just in case, please run an integrity-check on the database as it is before you delete/rebuild the index.

The python script creates the DB from scratch every time. However ...

> PRAGMA integrity_check;

... returned ...

> ok

> You might be able to learn more about the various versions of that index using this:
>
> <https://sqlite.org/sqlanalyze.html>
>
> It may show that the various versions of idx1 you see/create take different numbers of pages, or have different depths.  However, I agree with you that the amount of time disparity is unexpected.

Here is what sqlanalyze reports about the index:

> *** Index IDX1 of table TBL1 *******************************************
>
> Percentage of total database......................   3.7%    
> Number of entries................................. 551407    
> Bytes of storage consumed......................... 5517312  
> Bytes of payload.................................. 3826907     69.4%
> Bytes of metadata................................. 1670381     30.3%
> B-tree depth...................................... 3        
> Average payload per entry......................... 6.94      
> Average unused bytes per entry.................... 0.04      
> Average metadata per entry........................ 3.03      
> Average fanout.................................... 224.00    
> Non-sequential pages.............................. 1282        95.2%
> Maximum payload per entry......................... 7        
> Entries that use overflow......................... 0            0.0%
> Index pages used.................................. 6        
> Primary pages used................................ 1341      
> Overflow pages used............................... 0        
> Total pages used.................................. 1347      
> Unused bytes on index pages....................... 5816        23.7%
> Unused bytes on primary pages..................... 14208        0.26%
> Unused bytes on overflow pages.................... 0        
> Unused bytes on all pages......................... 20024        0.36%

I ran it twice, once before executing DROP INDEX and CREATE INDEX
commands ...

> $ ./sqlite3_analyzer db.sqlite3 > run1

... and once afterwards:

> $ ./sqlite3_analyzer db.sqlite3 > run2

The only difference between both runs:
> $ diff run1 run2
> 1260d1259
> < INSERT INTO space_used VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
> 1270a1270
>> INSERT INTO space_used VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);

--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

Keith Medcalf
In reply to this post by dirdi

Fascinating ... because the optimal query plan would be to do a table scan of tbl1 and then dip into tbl2 for each row to get the update value.  The only meaningful index then would be on tbl2(int) (that is idx3) since an index will be unhelpful for the LIKE constraint.

Is the generated code different for the two queries, the one that runs slow -vs- the one that runs fast?  (That is, the output from EXPLAIN ..., not the output from EXPLAIN QUERY PLAN, or from .eqp full in the CLI).

Also, what version of SQLite3 are you using?

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of dirdi
>Sent: Thursday, 29 August, 2019 11:57
>To: [hidden email]
>Subject: Re: [sqlite] Unexpected REINDEX behavior.
>
>On 8/29/19 6:10 PM, Keith Medcalf wrote:
>> Do you have sqlite_stat1 or sqlite_stat4 tables present (that is,
>have you ever run analyze)?
>
>No.
>
>> If the sqlite_stat1 or sqlite_stat4 tables are present, then if you
>run ANALYZE; rather than reindexing or dropping/recreating the index,
>what is the result?
>
>If I run ...
>
>> ANALYZE;
>> REINDEX;
>> ANALYZE;
>
>... instead of ...
>
>> DROP INDEX `idx1`;
>> CREATE INDEX `idx1` ON `tbl1` (
>>     `int`
>> );
>
>the UPDATE query remains being slow (~36m).
>
>--
>Best regards
>
>dirdi
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

Simon Slavin-3
In reply to this post by dirdi
On 29 Aug 2019, at 8:12pm, dirdi <[hidden email]> wrote:

> The only difference between both runs:

Well, that's nothing.  I see no reason for the massive change in timing from what you posted.  But someone else might.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

dirdi
On 8/29/19 9:35 PM, Simon Slavin wrote:
> On 29 Aug 2019, at 8:12pm, dirdi <[hidden email]> wrote:
>
>> The only difference between both runs:
>
> Well, that's nothing.  I see no reason for the massive change in timing from what you posted.  But someone else might.

Well the quoted line was shifted to output's second to last line. Hence
I assume that internally something must have changed.

--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

dirdi
In reply to this post by Keith Medcalf
On 8/29/19 9:19 PM, Keith Medcalf wrote:
> Fascinating ... because the optimal query plan would be to do a table scan of tbl1 and then dip into tbl2 for each row to get the update value.  The only meaningful index then would be on tbl2(int) (that is idx3) since an index will be unhelpful for the LIKE constraint.

I noticed that, too. But I stopped wondering about optimizer internals,
back when web-browsers started to JIT compile javascript ;)

> Is the generated code different for the two queries, the one that runs slow -vs- the one that runs fast?  (That is, the output from EXPLAIN ..., not the output from EXPLAIN QUERY PLAN, or from .eqp full in the CLI).

This is the output of EXPLAIN UPDATE before I recreate the index:

> 0 Init 0 73 0 00
> 1 Null 0 4 5 00
> 2 OpenRead 0 13 0 0 00
> 3 Rewind 0 7 0 00
> 4 Rowid 0 5 0 00
> 5 RowSetAdd 4 5 0 00
> 6 Next 0 4 0 01
> 7 OpenWrite 0 13 0 7 00
> 8 OpenWrite 1 18 0 k(2,,) 00
> 9 RowSetRead 4 72 5 00
> 10 NotExists 0 9 5 00
> 11 Rowid 0 6 0 00
> 12 Null 0 7 0 00
> 13 Null 0 8 0 00
> 14 Column 0 3 9 00
> 15 Null 0 10 0 00
> 16 Null 0 11 0 00
> 17 Null 0 12 0 00
> 18 Copy 5 13 0 00
> 19 Null 0 14 0 00
> 20 Column 0 1 15 00
> 21 Column 0 2 16 00
> 22 Null 0 22 22 00
> 23 Noop 5 3 0 00
> 24 Integer 1 23 0 00
> 25 OpenRead 4 11 0 3 00
> 26 OpenRead 6 23 0 k(2,,) 00
> 27 Column 0 2 24 00
> 28 Affinity 24 1 0 D 00
> 29 SeekLE 6 39 24 1 00
> 30 DeferredSeek 6 0 4 00
> 31 Column 4 1 29 00
> 32 Concat 29 28 26 00
> 33 Column 0 1 27 00
> 34 Function0 0 26 25 like(2) 02
> 35 IfNot 25 38 1 00
> 36 IdxRowid 6 22 0 00
> 37 DecrJumpZero 23 39 0 00
> 38 Prev 6 30 0 00
> 39 SCopy 22 17 0 00
> 40 Column 0 4 18 NULL 00
> 41 Column 0 5 19 NULL 00
> 42 Column 0 6 20 0 00
> 43 Affinity 14 7 0 DBDDBDD 00
> 44 SCopy 17 2 0 00
> 45 IntCopy 13 3 0 00
> 46 MakeRecord 2 2 1 00
> 47 MakeRecord 14 7 21 00
> 48 FkIfZero 0 56 0 00
> 49 IsNull 9 56 0 00
> 50 SCopy 9 25 0 00
> 51 MustBeInt 25 55 0 00
> 52 OpenRead 7 11 0 3 00
> 53 NotExists 7 55 25 00
> 54 Goto 0 56 0 00
> 55 FkCounter 0 -1 0 00
> 56 Close 7 0 0 00
> 57 Column 0 3 26 00
> 58 Rowid 0 27 0 00
> 59 IdxDelete 1 26 2 00
> 60 Delete 0 68 13 tbl1 00
> 61 IsNull 17 68 0 00
> 62 SCopy 17 25 0 00
> 63 MustBeInt 25 67 0 00
> 64 OpenRead 8 11 0 3 00
> 65 NotExists 8 67 25 00
> 66 Goto 0 68 0 00
> 67 FkCounter 0 1 0 00
> 68 Close 8 0 0 00
> 69 IdxInsert 1 1 2 2 00
> 70 Insert 0 21 13 tbl1 05
> 71 Goto 0 9 0 00
> 72 Halt 0 0 0 00
> 73 Transaction 0 1 20 0 01
> 74 String8 0 28 0 % 00
> 75 Goto 0 1 0 00

... and that is the output afterwards:

> 0 Init 0 73 0 00
> 1 Null 0 4 5 00
> 2 OpenRead 0 13 0 0 00
> 3 Rewind 0 7 0 00
> 4 Rowid 0 5 0 00
> 5 RowSetAdd 4 5 0 00
> 6 Next 0 4 0 01
> 7 OpenWrite 0 13 0 7 00
> 8 OpenWrite 2 18 0 k(2,,) 00
> 9 RowSetRead 4 72 5 00
> 10 NotExists 0 9 5 00
> 11 Rowid 0 6 0 00
> 12 Null 0 7 0 00
> 13 Null 0 8 0 00
> 14 Column 0 3 9 00
> 15 Null 0 10 0 00
> 16 Null 0 11 0 00
> 17 Null 0 12 0 00
> 18 Copy 5 13 0 00
> 19 Null 0 14 0 00
> 20 Column 0 1 15 00
> 21 Column 0 2 16 00
> 22 Null 0 22 22 00
> 23 Noop 5 3 0 00
> 24 Integer 1 23 0 00
> 25 OpenRead 4 11 0 3 00
> 26 OpenRead 6 23 0 k(2,,) 00
> 27 Column 0 2 24 00
> 28 Affinity 24 1 0 D 00
> 29 SeekLE 6 39 24 1 00
> 30 DeferredSeek 6 0 4 00
> 31 Column 4 1 29 00
> 32 Concat 29 28 26 00
> 33 Column 0 1 27 00
> 34 Function0 0 26 25 like(2) 02
> 35 IfNot 25 38 1 00
> 36 IdxRowid 6 22 0 00
> 37 DecrJumpZero 23 39 0 00
> 38 Prev 6 30 0 00
> 39 SCopy 22 17 0 00
> 40 Column 0 4 18 NULL 00
> 41 Column 0 5 19 NULL 00
> 42 Column 0 6 20 0 00
> 43 Affinity 14 7 0 DBDDBDD 00
> 44 SCopy 17 2 0 00
> 45 IntCopy 13 3 0 00
> 46 MakeRecord 2 2 1 00
> 47 MakeRecord 14 7 21 00
> 48 FkIfZero 0 56 0 00
> 49 IsNull 9 56 0 00
> 50 SCopy 9 25 0 00
> 51 MustBeInt 25 55 0 00
> 52 OpenRead 7 11 0 3 00
> 53 NotExists 7 55 25 00
> 54 Goto 0 56 0 00
> 55 FkCounter 0 -1 0 00
> 56 Close 7 0 0 00
> 57 Column 0 3 26 00
> 58 Rowid 0 27 0 00
> 59 IdxDelete 2 26 2 00
> 60 Delete 0 68 13 tbl1 00
> 61 IsNull 17 68 0 00
> 62 SCopy 17 25 0 00
> 63 MustBeInt 25 67 0 00
> 64 OpenRead 8 11 0 3 00
> 65 NotExists 8 67 25 00
> 66 Goto 0 68 0 00
> 67 FkCounter 0 1 0 00
> 68 Close 8 0 0 00
> 69 IdxInsert 2 1 2 2 00
> 70 Insert 0 21 13 tbl1 05
> 71 Goto 0 9 0 00
> 72 Halt 0 0 0 00
> 73 Transaction 0 1 22 0 01
> 74 String8 0 28 0 % 00
> 75 Goto 0 1 0 00

Indeed, they are not identical:

> $ diff explain{1,2}
> 9c9
> < > 8 OpenWrite 1 18 0 k(2,,) 00
> ---
>> > 8 OpenWrite 2 18 0 k(2,,) 00
> 60c60
> < > 59 IdxDelete 1 26 2 00
> ---
>> > 59 IdxDelete 2 26 2 00
> 70c70
> < > 69 IdxInsert 1 1 2 2 00
> ---
>> > 69 IdxInsert 2 1 2 2 00
> 74c74
> < > 73 Transaction 0 1 20 0 01
> ---
>> > 73 Transaction 0 1 22 0 01


> Also, what version of SQLite3 are you using?

> SELECT sqlite_version();

returns

> 3.29.0
Packages installed on my dev machine:
> $ dpkg-query -l "*sqlite*"
> Desired=Unknown/Install/Remove/Purge/Hold
> | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name                    Version        Architecture Description
> +++-=======================-==============-============-=================================
> ii  libqt5sql5-sqlite:amd64 5.11.3+dfsg1-4 amd64        Qt 5 SQLite 3 database driver
> ii  libsqlite3-0:amd64      3.29.0-2       amd64        SQLite 3 shared library
> ii  sqlitebrowser           3.11.2-1       amd64        GUI editor for SQLite databases

--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

Keith Medcalf

Fascinating again, because the code is identical.  The p1 difference in the OpenWrite/IdxInsert/IdxDelete opcodes is just the "file handle" that happens to be being used and doesn't have any real significance (at least I don't think it does).  The p3 in the Transaction opcode is merely the schemacookie and not only indicates that the schema has seen 2 changes between these plans being generated (presumably the DROP and CREATE of the index).

I am at a loss to explain the difference in execution speed.

--
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of dirdi
>Sent: Thursday, 29 August, 2019 15:10
>To: [hidden email]
>Subject: Re: [sqlite] Unexpected REINDEX behavior.
>
>On 8/29/19 9:19 PM, Keith Medcalf wrote:
>> Fascinating ... because the optimal query plan would be to do a
>table scan of tbl1 and then dip into tbl2 for each row to get the
>update value.  The only meaningful index then would be on tbl2(int)
>(that is idx3) since an index will be unhelpful for the LIKE
>constraint.
>
>I noticed that, too. But I stopped wondering about optimizer
>internals,
>back when web-browsers started to JIT compile javascript ;)
>
>> Is the generated code different for the two queries, the one that
>runs slow -vs- the one that runs fast?  (That is, the output from
>EXPLAIN ..., not the output from EXPLAIN QUERY PLAN, or from .eqp
>full in the CLI).
>
>This is the output of EXPLAIN UPDATE before I recreate the index:
>> 0 Init 0 73 0 00
>> 1 Null 0 4 5 00
>> 2 OpenRead 0 13 0 0 00
>> 3 Rewind 0 7 0 00
>> 4 Rowid 0 5 0 00
>> 5 RowSetAdd 4 5 0 00
>> 6 Next 0 4 0 01
>> 7 OpenWrite 0 13 0 7 00
>> 8 OpenWrite 1 18 0 k(2,,) 00
>> 9 RowSetRead 4 72 5 00
>> 10 NotExists 0 9 5 00
>> 11 Rowid 0 6 0 00
>> 12 Null 0 7 0 00
>> 13 Null 0 8 0 00
>> 14 Column 0 3 9 00
>> 15 Null 0 10 0 00
>> 16 Null 0 11 0 00
>> 17 Null 0 12 0 00
>> 18 Copy 5 13 0 00
>> 19 Null 0 14 0 00
>> 20 Column 0 1 15 00
>> 21 Column 0 2 16 00
>> 22 Null 0 22 22 00
>> 23 Noop 5 3 0 00
>> 24 Integer 1 23 0 00
>> 25 OpenRead 4 11 0 3 00
>> 26 OpenRead 6 23 0 k(2,,) 00
>> 27 Column 0 2 24 00
>> 28 Affinity 24 1 0 D 00
>> 29 SeekLE 6 39 24 1 00
>> 30 DeferredSeek 6 0 4 00
>> 31 Column 4 1 29 00
>> 32 Concat 29 28 26 00
>> 33 Column 0 1 27 00
>> 34 Function0 0 26 25 like(2) 02
>> 35 IfNot 25 38 1 00
>> 36 IdxRowid 6 22 0 00
>> 37 DecrJumpZero 23 39 0 00
>> 38 Prev 6 30 0 00
>> 39 SCopy 22 17 0 00
>> 40 Column 0 4 18 NULL 00
>> 41 Column 0 5 19 NULL 00
>> 42 Column 0 6 20 0 00
>> 43 Affinity 14 7 0 DBDDBDD 00
>> 44 SCopy 17 2 0 00
>> 45 IntCopy 13 3 0 00
>> 46 MakeRecord 2 2 1 00
>> 47 MakeRecord 14 7 21 00
>> 48 FkIfZero 0 56 0 00
>> 49 IsNull 9 56 0 00
>> 50 SCopy 9 25 0 00
>> 51 MustBeInt 25 55 0 00
>> 52 OpenRead 7 11 0 3 00
>> 53 NotExists 7 55 25 00
>> 54 Goto 0 56 0 00
>> 55 FkCounter 0 -1 0 00
>> 56 Close 7 0 0 00
>> 57 Column 0 3 26 00
>> 58 Rowid 0 27 0 00
>> 59 IdxDelete 1 26 2 00
>> 60 Delete 0 68 13 tbl1 00
>> 61 IsNull 17 68 0 00
>> 62 SCopy 17 25 0 00
>> 63 MustBeInt 25 67 0 00
>> 64 OpenRead 8 11 0 3 00
>> 65 NotExists 8 67 25 00
>> 66 Goto 0 68 0 00
>> 67 FkCounter 0 1 0 00
>> 68 Close 8 0 0 00
>> 69 IdxInsert 1 1 2 2 00
>> 70 Insert 0 21 13 tbl1 05
>> 71 Goto 0 9 0 00
>> 72 Halt 0 0 0 00
>> 73 Transaction 0 1 20 0 01
>> 74 String8 0 28 0 % 00
>> 75 Goto 0 1 0 00
>
>... and that is the output afterwards:
>> 0 Init 0 73 0 00
>> 1 Null 0 4 5 00
>> 2 OpenRead 0 13 0 0 00
>> 3 Rewind 0 7 0 00
>> 4 Rowid 0 5 0 00
>> 5 RowSetAdd 4 5 0 00
>> 6 Next 0 4 0 01
>> 7 OpenWrite 0 13 0 7 00
>> 8 OpenWrite 2 18 0 k(2,,) 00
>> 9 RowSetRead 4 72 5 00
>> 10 NotExists 0 9 5 00
>> 11 Rowid 0 6 0 00
>> 12 Null 0 7 0 00
>> 13 Null 0 8 0 00
>> 14 Column 0 3 9 00
>> 15 Null 0 10 0 00
>> 16 Null 0 11 0 00
>> 17 Null 0 12 0 00
>> 18 Copy 5 13 0 00
>> 19 Null 0 14 0 00
>> 20 Column 0 1 15 00
>> 21 Column 0 2 16 00
>> 22 Null 0 22 22 00
>> 23 Noop 5 3 0 00
>> 24 Integer 1 23 0 00
>> 25 OpenRead 4 11 0 3 00
>> 26 OpenRead 6 23 0 k(2,,) 00
>> 27 Column 0 2 24 00
>> 28 Affinity 24 1 0 D 00
>> 29 SeekLE 6 39 24 1 00
>> 30 DeferredSeek 6 0 4 00
>> 31 Column 4 1 29 00
>> 32 Concat 29 28 26 00
>> 33 Column 0 1 27 00
>> 34 Function0 0 26 25 like(2) 02
>> 35 IfNot 25 38 1 00
>> 36 IdxRowid 6 22 0 00
>> 37 DecrJumpZero 23 39 0 00
>> 38 Prev 6 30 0 00
>> 39 SCopy 22 17 0 00
>> 40 Column 0 4 18 NULL 00
>> 41 Column 0 5 19 NULL 00
>> 42 Column 0 6 20 0 00
>> 43 Affinity 14 7 0 DBDDBDD 00
>> 44 SCopy 17 2 0 00
>> 45 IntCopy 13 3 0 00
>> 46 MakeRecord 2 2 1 00
>> 47 MakeRecord 14 7 21 00
>> 48 FkIfZero 0 56 0 00
>> 49 IsNull 9 56 0 00
>> 50 SCopy 9 25 0 00
>> 51 MustBeInt 25 55 0 00
>> 52 OpenRead 7 11 0 3 00
>> 53 NotExists 7 55 25 00
>> 54 Goto 0 56 0 00
>> 55 FkCounter 0 -1 0 00
>> 56 Close 7 0 0 00
>> 57 Column 0 3 26 00
>> 58 Rowid 0 27 0 00
>> 59 IdxDelete 2 26 2 00
>> 60 Delete 0 68 13 tbl1 00
>> 61 IsNull 17 68 0 00
>> 62 SCopy 17 25 0 00
>> 63 MustBeInt 25 67 0 00
>> 64 OpenRead 8 11 0 3 00
>> 65 NotExists 8 67 25 00
>> 66 Goto 0 68 0 00
>> 67 FkCounter 0 1 0 00
>> 68 Close 8 0 0 00
>> 69 IdxInsert 2 1 2 2 00
>> 70 Insert 0 21 13 tbl1 05
>> 71 Goto 0 9 0 00
>> 72 Halt 0 0 0 00
>> 73 Transaction 0 1 22 0 01
>> 74 String8 0 28 0 % 00
>> 75 Goto 0 1 0 00
>
>Indeed, they are not identical:
>> $ diff explain{1,2}
>> 9c9
>> < > 8 OpenWrite 1 18 0 k(2,,) 00
>> ---
>>> > 8 OpenWrite 2 18 0 k(2,,) 00
>> 60c60
>> < > 59 IdxDelete 1 26 2 00
>> ---
>>> > 59 IdxDelete 2 26 2 00
>> 70c70
>> < > 69 IdxInsert 1 1 2 2 00
>> ---
>>> > 69 IdxInsert 2 1 2 2 00
>> 74c74
>> < > 73 Transaction 0 1 20 0 01
>> ---
>>> > 73 Transaction 0 1 22 0 01
>
>
>> Also, what version of SQLite3 are you using?
>
>> SELECT sqlite_version();
>
>returns
>
>> 3.29.0
>Packages installed on my dev machine:
>> $ dpkg-query -l "*sqlite*"
>> Desired=Unknown/Install/Remove/Purge/Hold
>> | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-
>aWait/Trig-pend
>> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
>> ||/ Name                    Version        Architecture Description
>> +++-=======================-==============-============-
>=================================
>> ii  libqt5sql5-sqlite:amd64 5.11.3+dfsg1-4 amd64        Qt 5 SQLite
>3 database driver
>> ii  libsqlite3-0:amd64      3.29.0-2       amd64        SQLite 3
>shared library
>> ii  sqlitebrowser           3.11.2-1       amd64        GUI editor
>for SQLite databases
>
>--
>Best regards
>
>dirdi
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

dirdi
On 8/30/19 3:34 AM, Keith Medcalf wrote:
> Fascinating again, because the code is identical.  The p1 difference in the OpenWrite/IdxInsert/IdxDelete opcodes is just the "file handle" that happens to be being used and doesn't have any real significance (at least I don't think it does).  The p3 in the Transaction opcode is merely the schemacookie and not only indicates that the schema has seen 2 changes between these plans being generated (presumably the DROP and CREATE of the index).
>
> I am at a loss to explain the difference in execution speed.

Do not bother! I was just curious if this was expected behavior. Until
now I put about 5h into solving this little riddle. With DROP INDEX /
CREATE INDEX I got something that works for me and therefore I will
leave it at that.

However, if one wants to investigate further, I am happy to provide
additional information.


--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

David Raymond
In reply to this post by dirdi
Sorry if my mind is working slowly today, but why are those showing up as a difference when they're exactly the same line?


The only difference between both runs:
> $ diff run1 run2
> 1260d1259
> < INSERT INTO space_used VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
> 1270a1270
>> INSERT INTO space_used VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

dirdi
On 8/30/19 3:42 PM, David Raymond wrote:
> Sorry if my mind is working slowly today, but why are those showing up as a difference when they're exactly the same line?
>
>
> The only difference between both runs:
>> $ diff run1 run2
>> 1260d1259
>> < INSERT INTO space_used VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
>> 1270a1270
>>> INSERT INTO space_used VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);

Because the line has been moved ;)

--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected REINDEX behavior.

dirdi
In reply to this post by dirdi
On 8/30/19 3:59 AM, dirdi wrote:
> Do not bother!
- "bother!"
+ "worry!"

Sorry @Keith, English is only a second language to me and sometimes I
mix-up idioms. You all were very helpful and I learned a lot so far. I
just feel a bit dumb right now =/


--
Best regards

dirdi
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users