speeding up an FTS5 search with a JOIN

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

speeding up an FTS5 search with a JOIN

Puneet Kishor-2
I asked this question on Stackoverflow with not much success, and a suggestion to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;

QUERY
PLAN
--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)

sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
308498

Run Time
: real 0.043 user 0.023668 sys 0.009005
```


As can be see above, the actual query takes ~43ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';

QUERY
PLAN
--SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:

sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
80789

Run Time
: real 0.047 user 0.008021 sys 0.009640
```

The actual query, in this case, takes ~47ms. So far so good. But the problem occurs when I join the two tables

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';

QUERY
PLAN
|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
 --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)

sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
80789

Run Time
: real 26.218 user 1.396376 sys 5.413630
```

The answer is correct but the query takes more than 26 seconds! Of course, I would like to speed up this query by several orders of magnitude, but I would also like to understand why this join is causing the slowdown.

Now, the reason I have constructed a query like this is because users can add further constraints for the table t1. For example,

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH 'foo’;
```

Also, in every operation, for every given constraint, two queries are performed, one that returns the count and the other that returns the actual columns. And, finally, only a subset of the results are returned using LIMIT and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the last constraint above would result in the following

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY <some t1 col>
   ...> LIMIT 30 OFFSET <some offset calculated by pagination>;
```

When no t1 columns are prescribed in the constraint, the default count (shown above) and default cols are returned with the FTS search

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY <some t1 col>
   ...> LIMIT 30 OFFSET 0;
```
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up an FTS5 search with a JOIN

Dan Kennedy-4

On 7/3/63 14:58, P Kishor wrote:

> I asked this question on Stackoverflow with not much success, and a suggestion to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>     ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
>
> QUERY
> PLAN
> --SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)
>
> sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
> 308498
>
> Run Time
> : real 0.043 user 0.023668 sys 0.009005
> ```
>
>
> As can be see above, the actual query takes ~43ms
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>     ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
>
> QUERY
> PLAN
> --SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:
>
> sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
> 80789
>
> Run Time
> : real 0.047 user 0.008021 sys 0.009640
> ```
>
> The actual query, in this case, takes ~47ms. So far so good. But the problem occurs when I join the two tables
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>     ...> SELECT Count(*) as num
>     ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>     ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
>
> QUERY
> PLAN
> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
>   --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)
>
> sqlite> SELECT Count(*) as num
>     ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>     ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
> 80789


That does seem slow. Are there many rows in table "t1" with t1.deleted
set to something other than 0? What does:

   SELECT count(*) AS num FROM t1 JOIN vt1 ON t1.t1Id=vt1.t1Id WHERE vt1
MATCH 'foo';

return?

Dan.




>
> Run Time
> : real 26.218 user 1.396376 sys 5.413630
> ```
>
> The answer is correct but the query takes more than 26 seconds! Of course, I would like to speed up this query by several orders of magnitude, but I would also like to understand why this join is causing the slowdown.
>
> Now, the reason I have constructed a query like this is because users can add further constraints for the table t1. For example,
>
> ```
> sqlite> SELECT Count(*) as num
>     ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>     ...> WHERE t1.deleted = 0 AND
>     ...> WHERE t1.frob = ‘bar' AND
>     ...> WHERE t1.nob = ‘baz' AND
>     ...> vt1 MATCH 'foo’;
> ```
>
> Also, in every operation, for every given constraint, two queries are performed, one that returns the count and the other that returns the actual columns. And, finally, only a subset of the results are returned using LIMIT and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the last constraint above would result in the following
>
> ```
> sqlite> SELECT Count(*) as num
>     ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>     ...> WHERE t1.deleted = 0 AND
>     ...> WHERE t1.frob = ‘bar' AND
>     ...> WHERE t1.nob = ‘baz' AND
>     ...> vt1 MATCH 'foo’;
> 20367
>
> sqlite> SELECT t1.id, t1.t1Id, … other cols …,
>     ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>     ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>     ...> WHERE t1.deleted = 0 AND
>     ...> WHERE t1.frob = ‘bar' AND
>     ...> WHERE t1.nob = ‘baz' AND
>     ...> vt1 MATCH ‘foo’
>     ...> ORDER BY <some t1 col>
>     ...> LIMIT 30 OFFSET <some offset calculated by pagination>;
> ```
>
> When no t1 columns are prescribed in the constraint, the default count (shown above) and default cols are returned with the FTS search
>
> ```
> sqlite> SELECT Count(*) as num
>     ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>     ...> WHERE t1.deleted = 0
>     ...> vt1 MATCH 'foo’;
> 20367
>
> sqlite> SELECT t1.id, t1.t1Id, … other cols …,
>     ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>     ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>     ...> WHERE t1.deleted = 0
>     ...> vt1 MATCH ‘foo’
>     ...> ORDER BY <some t1 col>
>     ...> LIMIT 30 OFFSET 0;
> ```
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up an FTS5 search with a JOIN

Puneet Kishor-2
In reply to this post by Puneet Kishor-2
Update: so, after much hitting of my head against the wall of sql, I came up with the following – as noted above, I really have two distinct set of queries I can do separately like so

Q1: (SELECT t1Id FROM t1 WHERE …) AS a

Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b

Then, I can do the following -

SELECT Count(*) FROM a WHERE a.t1Id IN b

Of course, in reality, I don’t do this separately but all in one go to make a really messy SQL but a really fast query, a couple of hundred ms as opposed to > 25s

You might notice that in my Q2 above I MATCHed for ‘bar’ instead of ‘foo’. That is because ‘bar’ returns fewer rows than ‘foo’ does. The problem remains when there are too many matches in the FTS query in which case the FTS query itself is slow, for example, with ‘foo’ which matches > 80K rows.

Now, one interesting point of comparison – the same kind of query (from the user point-of-view) against an ElasticSearch instance (that is, all the rows with ‘foo’ anywhere in the text) is very fast, in the order of sub-hundred ms. I realize it might be unfair comparing SQLite with ElasticSearch, but still. (Or, is it a fair comparison?)


> On Mar 7, 2020, at 8:59 AM, P Kishor <[hidden email]> wrote:
>
> I asked this question on Stackoverflow with not much success, and a suggestion to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
>
> QUERY
> PLAN
> --SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)
>
> sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
> 308498
>
> Run Time
> : real 0.043 user 0.023668 sys 0.009005
> ```
>
>
> As can be see above, the actual query takes ~43ms
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
>
> QUERY
> PLAN
> --SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:
>
> sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
> 80789
>
> Run Time
> : real 0.047 user 0.008021 sys 0.009640
> ```
>
> The actual query, in this case, takes ~47ms. So far so good. But the problem occurs when I join the two tables
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
>
> QUERY
> PLAN
> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)
>
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
> 80789
>
> Run Time
> : real 26.218 user 1.396376 sys 5.413630
> ```
>
> The answer is correct but the query takes more than 26 seconds! Of course, I would like to speed up this query by several orders of magnitude, but I would also like to understand why this join is causing the slowdown.
>
> Now, the reason I have constructed a query like this is because users can add further constraints for the table t1. For example,
>
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND
>   ...> WHERE t1.frob = ‘bar' AND
>   ...> WHERE t1.nob = ‘baz' AND
>   ...> vt1 MATCH 'foo’;
> ```
>
> Also, in every operation, for every given constraint, two queries are performed, one that returns the count and the other that returns the actual columns. And, finally, only a subset of the results are returned using LIMIT and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the last constraint above would result in the following
>
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND
>   ...> WHERE t1.frob = ‘bar' AND
>   ...> WHERE t1.nob = ‘baz' AND
>   ...> vt1 MATCH 'foo’;
> 20367
>
> sqlite> SELECT t1.id, t1.t1Id, … other cols …,
>   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND
>   ...> WHERE t1.frob = ‘bar' AND
>   ...> WHERE t1.nob = ‘baz' AND
>   ...> vt1 MATCH ‘foo’
>   ...> ORDER BY <some t1 col>
>   ...> LIMIT 30 OFFSET <some offset calculated by pagination>;
> ```
>
> When no t1 columns are prescribed in the constraint, the default count (shown above) and default cols are returned with the FTS search
>
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0
>   ...> vt1 MATCH 'foo’;
> 20367
>
> sqlite> SELECT t1.id, t1.t1Id, … other cols …,
>   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0
>   ...> vt1 MATCH ‘foo’
>   ...> ORDER BY <some t1 col>
>   ...> LIMIT 30 OFFSET 0;
> ```
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up an FTS5 search with a JOIN

Dan Kennedy-4

On 9/3/63 01:44, Puneet Kishor wrote:

> Update: so, after much hitting of my head against the wall of sql, I came up with the following – as noted above, I really have two distinct set of queries I can do separately like so
>
> Q1: (SELECT t1Id FROM t1 WHERE …) AS a
>
> Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b
>
> Then, I can do the following -
>
> SELECT Count(*) FROM a WHERE a.t1Id IN b
>
> Of course, in reality, I don’t do this separately but all in one go to make a really messy SQL but a really fast query, a couple of hundred ms as opposed to > 25s


I don't see why that would be any different from your join query. What
does EXPLAIN QUERY PLAN say for the version that takes a few hundred ms?

Dan.



>
> You might notice that in my Q2 above I MATCHed for ‘bar’ instead of ‘foo’. That is because ‘bar’ returns fewer rows than ‘foo’ does. The problem remains when there are too many matches in the FTS query in which case the FTS query itself is slow, for example, with ‘foo’ which matches > 80K rows.
>
> Now, one interesting point of comparison – the same kind of query (from the user point-of-view) against an ElasticSearch instance (that is, all the rows with ‘foo’ anywhere in the text) is very fast, in the order of sub-hundred ms. I realize it might be unfair comparing SQLite with ElasticSearch, but still. (Or, is it a fair comparison?)
>
>
>> On Mar 7, 2020, at 8:59 AM, P Kishor <[hidden email]> wrote:
>>
>> I asked this question on Stackoverflow with not much success, and a suggestion to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)
>>
>> ```
>> sqlite> EXPLAIN QUERY PLAN
>>    ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
>>
>> QUERY
>> PLAN
>> --SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)
>>
>> sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
>> 308498
>>
>> Run Time
>> : real 0.043 user 0.023668 sys 0.009005
>> ```
>>
>>
>> As can be see above, the actual query takes ~43ms
>>
>> ```
>> sqlite> EXPLAIN QUERY PLAN
>>    ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
>>
>> QUERY
>> PLAN
>> --SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:
>>
>> sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
>> 80789
>>
>> Run Time
>> : real 0.047 user 0.008021 sys 0.009640
>> ```
>>
>> The actual query, in this case, takes ~47ms. So far so good. But the problem occurs when I join the two tables
>>
>> ```
>> sqlite> EXPLAIN QUERY PLAN
>>    ...> SELECT Count(*) as num
>>    ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>>    ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
>>
>> QUERY
>> PLAN
>> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
>> --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)
>>
>> sqlite> SELECT Count(*) as num
>>    ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>>    ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
>> 80789
>>
>> Run Time
>> : real 26.218 user 1.396376 sys 5.413630
>> ```
>>
>> The answer is correct but the query takes more than 26 seconds! Of course, I would like to speed up this query by several orders of magnitude, but I would also like to understand why this join is causing the slowdown.
>>
>> Now, the reason I have constructed a query like this is because users can add further constraints for the table t1. For example,
>>
>> ```
>> sqlite> SELECT Count(*) as num
>>    ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>>    ...> WHERE t1.deleted = 0 AND
>>    ...> WHERE t1.frob = ‘bar' AND
>>    ...> WHERE t1.nob = ‘baz' AND
>>    ...> vt1 MATCH 'foo’;
>> ```
>>
>> Also, in every operation, for every given constraint, two queries are performed, one that returns the count and the other that returns the actual columns. And, finally, only a subset of the results are returned using LIMIT and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the last constraint above would result in the following
>>
>> ```
>> sqlite> SELECT Count(*) as num
>>    ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>>    ...> WHERE t1.deleted = 0 AND
>>    ...> WHERE t1.frob = ‘bar' AND
>>    ...> WHERE t1.nob = ‘baz' AND
>>    ...> vt1 MATCH 'foo’;
>> 20367
>>
>> sqlite> SELECT t1.id, t1.t1Id, … other cols …,
>>    ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>>    ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>>    ...> WHERE t1.deleted = 0 AND
>>    ...> WHERE t1.frob = ‘bar' AND
>>    ...> WHERE t1.nob = ‘baz' AND
>>    ...> vt1 MATCH ‘foo’
>>    ...> ORDER BY <some t1 col>
>>    ...> LIMIT 30 OFFSET <some offset calculated by pagination>;
>> ```
>>
>> When no t1 columns are prescribed in the constraint, the default count (shown above) and default cols are returned with the FTS search
>>
>> ```
>> sqlite> SELECT Count(*) as num
>>    ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>>    ...> WHERE t1.deleted = 0
>>    ...> vt1 MATCH 'foo’;
>> 20367
>>
>> sqlite> SELECT t1.id, t1.t1Id, … other cols …,
>>    ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>>    ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>>    ...> WHERE t1.deleted = 0
>>    ...> vt1 MATCH ‘foo’
>>    ...> ORDER BY <some t1 col>
>>    ...> LIMIT 30 OFFSET 0;
>> ```
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up an FTS5 search with a JOIN

Puneet Kishor-2


> On Mar 9, 2020, at 1:04 PM, Dan Kennedy <[hidden email]> wrote:
>
>
> On 9/3/63 01:44, Puneet Kishor wrote:
>> Update: so, after much hitting of my head against the wall of sql, I came up with the following – as noted above, I really have two distinct set of queries I can do separately like so
>>
>> Q1: (SELECT t1Id FROM t1 WHERE …) AS a
>>
>> Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b
>>
>> Then, I can do the following -
>>
>> SELECT Count(*) FROM a WHERE a.t1Id IN b
>>
>> Of course, in reality, I don’t do this separately but all in one go to make a really messy SQL but a really fast query, a couple of hundred ms as opposed to > 25s
>
>
> I don't see why that would be any different from your join query. What does EXPLAIN QUERY PLAN say for the version that takes a few hundred ms?


Here it is, the new query made up of two queries derived from the original query (summarize again below). The query itself takes 301ms.

```
SQLite version 3.30.0 2019-10-04 15:03:17
Enter ".help" for usage hints.
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) FROM
   ...>
   ...> -- this one is 'QUERY a'
   ...> (SELECT Count(DISTINCT t1.t1Id) t1Id
   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id
   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '') a
   ...>
   ...> WHERE a.t1Id IN
   ...>
   ...> -- and this one is 'QUERY b'
   ...> (SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH "foo");
QUERY PLAN
|--CO-ROUTINE 1
|  |--SEARCH TABLE t2 USING INDEX ix_t2 (deleted=? AND deleted=?)
|  `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
|--SCAN SUBQUERY 1 AS a
`--LIST SUBQUERY 2
   `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
sqlite>
```

Here is the original query (QUERY zero). This is the one that takes a very long time, 10.611s. Note, in this one I’ve added an additional table ’t2’ but the concept is the same. Normal tables JOINed to each other, and then JOINed to a virtual table.

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(DISTINCT t1.t1Id)
   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '' AND vt1 MATCH "foo";
QUERY PLAN
|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
sqlite>
```

Here are the two queries derived from QUERY zero. The first one (QUERY a) takes 324ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(DISTINCT t1.t1Id)
   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id
   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '';
QUERY PLAN
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
sqlite>
```

The second query (QUERY b) takes: 27ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH “foo";
QUERY PLAN
`--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
sqlite>
```

One more thing: All of this goes to hell if the virtual table returns way too many matches. For example, I have a term that returns 80K rows from the FTS MATCH. In that case, even my new query very slow because, well, because the QUERY b above is slow.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: speeding up an FTS5 search with a JOIN

Puneet Kishor-2
following up on my own response to Dan…

I see what you mean Dan. I am not an expert at reading the query plans, but I do see that effectively my new query has the same query plan as the last two queries combined as well as the original query. The only difference is the order in which it proceeds. So, the original (slow query)

|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)

The two sub-queries separately that run very fast

`--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m

and
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)

and the new query that also runs very fast using the two sub-queries

QUERY PLAN
|--CO-ROUTINE 1
|  |--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
|  `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
|--SCAN SUBQUERY 1 AS a
`--LIST SUBQUERY 2
  `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m


No idea what is going on.


> On Mar 9, 2020, at 2:08 PM, P Kishor <[hidden email]> wrote:
>
>
>
>> On Mar 9, 2020, at 1:04 PM, Dan Kennedy <[hidden email]> wrote:
>>
>>
>> On 9/3/63 01:44, Puneet Kishor wrote:
>>> Update: so, after much hitting of my head against the wall of sql, I came up with the following – as noted above, I really have two distinct set of queries I can do separately like so
>>>
>>> Q1: (SELECT t1Id FROM t1 WHERE …) AS a
>>>
>>> Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b
>>>
>>> Then, I can do the following -
>>>
>>> SELECT Count(*) FROM a WHERE a.t1Id IN b
>>>
>>> Of course, in reality, I don’t do this separately but all in one go to make a really messy SQL but a really fast query, a couple of hundred ms as opposed to > 25s
>>
>>
>> I don't see why that would be any different from your join query. What does EXPLAIN QUERY PLAN say for the version that takes a few hundred ms?
>
>
> Here it is, the new query made up of two queries derived from the original query (summarize again below). The query itself takes 301ms.
>
> ```
> SQLite version 3.30.0 2019-10-04 15:03:17
> Enter ".help" for usage hints.
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) FROM
>   ...>
>   ...> -- this one is 'QUERY a'
>   ...> (SELECT Count(DISTINCT t1.t1Id) t1Id
>   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id
>   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '') a
>   ...>
>   ...> WHERE a.t1Id IN
>   ...>
>   ...> -- and this one is 'QUERY b'
>   ...> (SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH "foo");
> QUERY PLAN
> |--CO-ROUTINE 1
> |  |--SEARCH TABLE t2 USING INDEX ix_t2 (deleted=? AND deleted=?)
> |  `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
> |--SCAN SUBQUERY 1 AS a
> `--LIST SUBQUERY 2
>   `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> sqlite>
> ```
>
> Here is the original query (QUERY zero). This is the one that takes a very long time, 10.611s. Note, in this one I’ve added an additional table ’t2’ but the concept is the same. Normal tables JOINed to each other, and then JOINed to a virtual table.
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(DISTINCT t1.t1Id)
>   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '' AND vt1 MATCH "foo";
> QUERY PLAN
> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> |--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
> `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
> sqlite>
> ```
>
> Here are the two queries derived from QUERY zero. The first one (QUERY a) takes 324ms
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(DISTINCT t1.t1Id)
>   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id
>   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '';
> QUERY PLAN
> |--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
> `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
> sqlite>
> ```
>
> The second query (QUERY b) takes: 27ms
>
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH “foo";
> QUERY PLAN
> `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> sqlite>
> ```
>
> One more thing: All of this goes to hell if the virtual table returns way too many matches. For example, I have a term that returns 80K rows from the FTS MATCH. In that case, even my new query very slow because, well, because the QUERY b above is slow.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users