Query planner regression with FTS4: slower path is chosen

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Query planner regression with FTS4: slower path is chosen

Paul
I have a test case when the regression can be observed in queries that
use JOINs with FTS4 tables, somewhere in between 3.22.0 and 3.23.0.
For some reason the planner decides to search non-FTS table first then
scan the whole FTS table. Version 3.22.0 is the last unaffected, while
issue is still present in HEAD.

Probably it has something to do with a fact that, according to EXPLAIN,
new version of planner ignores LEFT join and considers it just a JOIN.
At least it feels that way, anyway.

Test case:


CREATE VIRTUAL TABLE search USING FTS4(text);

WITH RECURSIVE
 cnt(x) AS (
    SELECT 1
    UNION ALL
    SELECT x+1 FROM cnt
     LIMIT 2000
 )
INSERT INTO search(docid, text) SELECT x, 'test' || x FROM cnt;

CREATE TABLE foo(s_docid integer primary key, bar integer);

WITH RECURSIVE
 cnt(x) AS (
    SELECT 1
    UNION ALL
    SELECT x+1 FROM cnt
     LIMIT 2000
 )
INSERT INTO foo(s_docid, bar) SELECT x, 1 FROM cnt;

.timer on

-- Fast
SELECT COUNT() FROM search LEFT JOIN foo  
 ON s_docid = docid
 WHERE bar = 1 AND search MATCH 'test*';

-- Fast
SELECT COUNT() FROM foo
 WHERE bar = 1
 AND s_docid IN (
   SELECT docid FROM search WHERE search MATCH 'test*'
 );

-- Create index, as some real-life queries use searches by `bar`
CREATE INDEX foo_bar_idx ON foo (bar);

-- Slow
SELECT COUNT() FROM search LEFT JOIN foo
 ON s_docid = docid
 WHERE bar = 1 AND search MATCH 'test*';

-- As fast as before (current workaround)
SELECT COUNT() FROM foo
 WHERE bar = 1
 AND s_docid IN (
   SELECT docid FROM search WHERE search MATCH 'test*'
 );



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner regression with FTS4: slower path is chosen

Paul
I has been a while without response, so I just bumping this message.


19 July 2019, 14:21:27, by "Paul" <[hidden email]>:

> I have a test case when the regression can be observed in queries that
> use JOINs with FTS4 tables, somewhere in between 3.22.0 and 3.23.0.
> For some reason the planner decides to search non-FTS table first then
> scan the whole FTS table. Version 3.22.0 is the last unaffected, while
> issue is still present in HEAD.
>
> Probably it has something to do with a fact that, according to EXPLAIN,
> new version of planner ignores LEFT join and considers it just a JOIN.
> At least it feels that way, anyway.
>
> Test case:
>
>
> CREATE VIRTUAL TABLE search USING FTS4(text);
>
> WITH RECURSIVE
>  cnt(x) AS (
>     SELECT 1
>     UNION ALL
>     SELECT x+1 FROM cnt
>      LIMIT 2000
>  )
> INSERT INTO search(docid, text) SELECT x, 'test' || x FROM cnt;
>
> CREATE TABLE foo(s_docid integer primary key, bar integer);
>
> WITH RECURSIVE
>  cnt(x) AS (
>     SELECT 1
>     UNION ALL
>     SELECT x+1 FROM cnt
>      LIMIT 2000
>  )
> INSERT INTO foo(s_docid, bar) SELECT x, 1 FROM cnt;
>
> .timer on
>
> -- Fast
> SELECT COUNT() FROM search LEFT JOIN foo  
>  ON s_docid = docid
>  WHERE bar = 1 AND search MATCH 'test*';
>
> -- Fast
> SELECT COUNT() FROM foo
>  WHERE bar = 1
>  AND s_docid IN (
>    SELECT docid FROM search WHERE search MATCH 'test*'
>  );
>
> -- Create index, as some real-life queries use searches by `bar`
> CREATE INDEX foo_bar_idx ON foo (bar);
>
> -- Slow
> SELECT COUNT() FROM search LEFT JOIN foo
>  ON s_docid = docid
>  WHERE bar = 1 AND search MATCH 'test*';
>
> -- As fast as before (current workaround)
> SELECT COUNT() FROM foo
>  WHERE bar = 1
>  AND s_docid IN (
>    SELECT docid FROM search WHERE search MATCH 'test*'
>  );
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Query planner regression with FTS4: slower path is chosen

Dan Kennedy-4

On 2/9/62 16:57, Paul wrote:

> I has been a while without response, so I just bumping this message.
>
>
> 19 July 2019, 14:21:27, by "Paul" <[hidden email]>:
>
>> I have a test case when the regression can be observed in queries that
>> use JOINs with FTS4 tables, somewhere in between 3.22.0 and 3.23.0.
>> For some reason the planner decides to search non-FTS table first then
>> scan the whole FTS table. Version 3.22.0 is the last unaffected, while
>> issue is still present in HEAD.
>>
>> Probably it has something to do with a fact that, according to EXPLAIN,
>> new version of planner ignores LEFT join and considers it just a JOIN.


Suspect that that is the change. The LEFT JOIN is equivalent to a
regular join in this case due to the "bar = 1" term in the WHERE clause.

Running ANALYZE after the index is created in the example script causes
SQLite to pick a better plan.

Or, changing the LEFT JOIN to CROSS JOIN works to force SQLite to pick
the plan you want.

FTS5 does a little better with the query, but only because it runs
faster - it still picks the slow plan. There might be room to improve
this in FTS5, but probably not for FTS3/4, which are now focused very
much on backwards compatibility.

Dan.




>> At least it feels that way, anyway.
>>
>> Test case:
>>
>>
>> CREATE VIRTUAL TABLE search USING FTS4(text);
>>
>> WITH RECURSIVE
>>   cnt(x) AS (
>>      SELECT 1
>>      UNION ALL
>>      SELECT x+1 FROM cnt
>>       LIMIT 2000
>>   )
>> INSERT INTO search(docid, text) SELECT x, 'test' || x FROM cnt;
>>
>> CREATE TABLE foo(s_docid integer primary key, bar integer);
>>
>> WITH RECURSIVE
>>   cnt(x) AS (
>>      SELECT 1
>>      UNION ALL
>>      SELECT x+1 FROM cnt
>>       LIMIT 2000
>>   )
>> INSERT INTO foo(s_docid, bar) SELECT x, 1 FROM cnt;
>>
>> .timer on
>>
>> -- Fast
>> SELECT COUNT() FROM search LEFT JOIN foo
>>   ON s_docid = docid
>>   WHERE bar = 1 AND search MATCH 'test*';
>>
>> -- Fast
>> SELECT COUNT() FROM foo
>>   WHERE bar = 1
>>   AND s_docid IN (
>>     SELECT docid FROM search WHERE search MATCH 'test*'
>>   );
>>
>> -- Create index, as some real-life queries use searches by `bar`
>> CREATE INDEX foo_bar_idx ON foo (bar);
>>
>> -- Slow
>> SELECT COUNT() FROM search LEFT JOIN foo
>>   ON s_docid = docid
>>   WHERE bar = 1 AND search MATCH 'test*';
>>
>> -- As fast as before (current workaround)
>> SELECT COUNT() FROM foo
>>   WHERE bar = 1
>>   AND s_docid IN (
>>     SELECT docid FROM search WHERE search MATCH 'test*'
>>   );
>>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users