Performance regression since 3.19.0

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance regression since 3.19.0

Peter Inglesby
Hi,

I've encountered a performance regression that was introduced in
41c27bc0ff1d3135 (3.19.0 2017-04-18 11:20:19).  Following the guidance in
"How To Report Bugs Against SQLite" on the wiki, I'm reporting it here.

With these tables and indexes:

CREATE TABLE t1 (a TEXT, b TEXT);
CREATE TABLE t2 (a TEXT, b TEXT);
CREATE INDEX t1_a ON t1 (a);
CREATE INDEX t1_b ON t1 (b);
CREATE INDEX t2_a ON t2 (a);
CREATE INDEX t2_b ON t2 (b);

this query now returns very slowly for large amounts of data:

SELECT *
FROM (SELECT * FROM t1 WHERE a = '123') s1
LEFT OUTER JOIN (SELECT * FROM t2 WHERE a = '123') s2
ON s1.b = s2.b;

This is the query plan after the regression:

0|0|0|SEARCH TABLE t1 USING INDEX t1_a (a=?)
0|1|1|SEARCH TABLE t2 USING INDEX t2_b (b=?)

And this is the query plan before the regression:

1|0|0|SEARCH TABLE t2 USING INDEX t2_a (a=?)
0|0|0|SEARCH TABLE t1 USING INDEX t1_a (a=?)
0|1|1|SEARCH SUBQUERY 1 AS s2 USING AUTOMATIC COVERING INDEX (b=?)

I have verified that the performance problem exists in the latest
checkout.  Is there any more information I could provide?

Thanks for your work on a great tool!

Peter.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance regression since 3.19.0

Richard Hipp-3
How does the performance compare with this:

CREATE TABLE t1 (a TEXT, b TEXT);
CREATE TABLE t2 (a TEXT, b TEXT);
CREATE INDEX t1_a ON t1 (a,b);
CREATE INDEX t2_a ON t2 (a,b);

SELECT *
FROM t1 LEFT JOIN t2 ON (t1.b=t2.b AND t2.a='123')
WHERE t1.a='123';


On 1/5/20, Peter Inglesby <[hidden email]> wrote:
> Is there any more information I could provide?

Run "ANALYZE;" on a database that contains actual data, then send us
the output of ".fullschema"

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance regression since 3.19.0

Peter Inglesby
> How does the performance compare with this:
>
> CREATE TABLE t1 (a TEXT, b TEXT);
> CREATE TABLE t2 (a TEXT, b TEXT);
> CREATE INDEX t1_a ON t1 (a,b);
> CREATE INDEX t2_a ON t2 (a,b);
>
> SELECT *
> FROM t1 LEFT JOIN t2 ON (t1.b=t2.b AND t2.a='123')
> WHERE t1.a='123';
>

Here are best-of-three averages for my version and your version for both
the good and bad commits:

good / mine :: <0.1s
bad / mine :: 12.5s
good / yours :: 0.2s
bad / yours :: 0.2s

In other words, there is no regression for your version.  But it does not
return as quickly as my version does before the regression.


> Run "ANALYZE;" on a database that contains actual data, then send us
> the output of ".fullschema"
>

Here you go:

sqlite> analyze;
sqlite> .fullschema
CREATE TABLE t1 (a TEXT, b TEXT);
CREATE TABLE t2 (a TEXT, b TEXT);
CREATE INDEX t1_a ON t1 (a);
CREATE INDEX t1_b ON t1 (b);
CREATE INDEX t2_a ON t2 (a);
CREATE INDEX t2_b ON t2 (b);
ANALYZE sqlite_master;
ANALYZE sqlite_master;
INSERT INTO sqlite_stat1 VALUES('t2','t2_b','1000000 1000');
INSERT INTO sqlite_stat1 VALUES('t2','t2_a','1000000 1000');
INSERT INTO sqlite_stat1 VALUES('t1','t1_b','1000000 1000');
INSERT INTO sqlite_stat1 VALUES('t1','t1_a','1000000 1000');
ANALYZE sqlite_master;

The sqlite_stat1 values make sense, because there are 1,000,000 rows in
each of t1 and t2, with 1,000 values for each of a and b.

After running ANALYZE, the origin query returns in 2.7s, but the query plan
is unchanged.

On Mon, 6 Jan 2020 at 01:53, Richard Hipp <[hidden email]> wrote:

> How does the performance compare with this:
>
> CREATE TABLE t1 (a TEXT, b TEXT);
> CREATE TABLE t2 (a TEXT, b TEXT);
> CREATE INDEX t1_a ON t1 (a,b);
> CREATE INDEX t2_a ON t2 (a,b);
>
> SELECT *
> FROM t1 LEFT JOIN t2 ON (t1.b=t2.b AND t2.a='123')
> WHERE t1.a='123';
>
>
> On 1/5/20, Peter Inglesby <[hidden email]> wrote:
> > Is there any more information I could provide?
>
> Run "ANALYZE;" on a database that contains actual data, then send us
> the output of ".fullschema"
>
> --
> D. Richard Hipp
> [hidden email]
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance regression since 3.19.0

Peter Inglesby
Hi -- I've not heard anything more about this, and I don't see a bug listed
at https://www.sqlite.org/src/rptview?rn=1.

Will it be addressed as a bug?

I hope I'm not coming across as demanding a fix -- I just want to make sure
this hasn't fallen through the gaps!

On Mon, 6 Jan 2020 at 20:24, Peter Inglesby <[hidden email]>
wrote:

>
> How does the performance compare with this:
>>
>> CREATE TABLE t1 (a TEXT, b TEXT);
>> CREATE TABLE t2 (a TEXT, b TEXT);
>> CREATE INDEX t1_a ON t1 (a,b);
>> CREATE INDEX t2_a ON t2 (a,b);
>>
>> SELECT *
>> FROM t1 LEFT JOIN t2 ON (t1.b=t2.b AND t2.a='123')
>> WHERE t1.a='123';
>>
>
> Here are best-of-three averages for my version and your version for both
> the good and bad commits:
>
> good / mine :: <0.1s
> bad / mine :: 12.5s
> good / yours :: 0.2s
> bad / yours :: 0.2s
>
> In other words, there is no regression for your version.  But it does not
> return as quickly as my version does before the regression.
>
>
>> Run "ANALYZE;" on a database that contains actual data, then send us
>> the output of ".fullschema"
>>
>
> Here you go:
>
> sqlite> analyze;
> sqlite> .fullschema
> CREATE TABLE t1 (a TEXT, b TEXT);
> CREATE TABLE t2 (a TEXT, b TEXT);
> CREATE INDEX t1_a ON t1 (a);
> CREATE INDEX t1_b ON t1 (b);
> CREATE INDEX t2_a ON t2 (a);
> CREATE INDEX t2_b ON t2 (b);
> ANALYZE sqlite_master;
> ANALYZE sqlite_master;
> INSERT INTO sqlite_stat1 VALUES('t2','t2_b','1000000 1000');
> INSERT INTO sqlite_stat1 VALUES('t2','t2_a','1000000 1000');
> INSERT INTO sqlite_stat1 VALUES('t1','t1_b','1000000 1000');
> INSERT INTO sqlite_stat1 VALUES('t1','t1_a','1000000 1000');
> ANALYZE sqlite_master;
>
> The sqlite_stat1 values make sense, because there are 1,000,000 rows in
> each of t1 and t2, with 1,000 values for each of a and b.
>
> After running ANALYZE, the origin query returns in 2.7s, but the query
> plan is unchanged.
>
> On Mon, 6 Jan 2020 at 01:53, Richard Hipp <[hidden email]> wrote:
>
>> How does the performance compare with this:
>>
>> CREATE TABLE t1 (a TEXT, b TEXT);
>> CREATE TABLE t2 (a TEXT, b TEXT);
>> CREATE INDEX t1_a ON t1 (a,b);
>> CREATE INDEX t2_a ON t2 (a,b);
>>
>> SELECT *
>> FROM t1 LEFT JOIN t2 ON (t1.b=t2.b AND t2.a='123')
>> WHERE t1.a='123';
>>
>>
>> On 1/5/20, Peter Inglesby <[hidden email]> wrote:
>> > Is there any more information I could provide?
>>
>> Run "ANALYZE;" on a database that contains actual data, then send us
>> the output of ".fullschema"
>>
>> --
>> D. Richard Hipp
>> [hidden email]
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users