Optimisation opportunity on subquery?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimisation opportunity on subquery?

Jonathan Moules-3
Hi List,

The below seems to my very-non-expert mind like there's scope for
query-plan optimisation.

I have two tables (simplified below):

CREATE TABLE users (
     item_id   TEXT REFERENCES item_info (item_id)
                       NOT NULL
                       COLLATE NOCASE,
     some_data     TEXT,
);

CREATE INDEX users__item_id__idx ON users (
     item_id
);

CREATE TABLE item_info (
     item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
                       NOT NULL
                       COLLATE NOCASE,
     more_data     TEXT
);

There are about 1 million records in users and 100,000 records in item_info.

These queries are all fast, taking about 0.002s:
     select * from item_info where item_id = ?;

     select count(1) from users group by item_id;

     select count(1) from users where item_id = ?;

But when I try and join them together, they're much slower at about 0.5s.
     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users group by item_id)
     USING (item_id)
     where item_id = ?;

I kind of expected SQLite would figure out that the outer WHERE clause
also applies to the subquery given the combination of USING and GROUP BY
means it has to apply anyway.

If I explicitly include the WHERE inside the subquery, the EXPLAIN QUERY
PLAN is identical, but it's back to the expected fast speed (0.002s):
     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users WHERE item_id = ?)
     USING (item_id)
     where item_id = ?;

sqlite 3.24.0

Cheers,

Jonathan



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Keith Medcalf

UNABLE TO REPRODUCE.  

Your CREATE TABLEs do not work (they contain syntax errors).  

The query you complain about taking a long time does not and cannot work because it is attempting to join two tables using a common column name, that column name NOT being contained in one of the tables (that is, the statement is in error and the error is thrown as soon as you try and prepare the statement).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Jonathan Moules
>Sent: Wednesday, 2 January, 2019 09:44
>To: SQLite mailing list
>Subject: [sqlite] Optimisation opportunity on subquery?
>
>Hi List,
>
>The below seems to my very-non-expert mind like there's scope for
>query-plan optimisation.
>
>I have two tables (simplified below):
>
>CREATE TABLE users (
>     item_id   TEXT REFERENCES item_info (item_id)
>                       NOT NULL
>                       COLLATE NOCASE,
>     some_data     TEXT,
>);
>
>CREATE INDEX users__item_id__idx ON users (
>     item_id
>);
>
>CREATE TABLE item_info (
>     item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
>                       NOT NULL
>                       COLLATE NOCASE,
>     more_data     TEXT
>);
>
>There are about 1 million records in users and 100,000 records in
>item_info.
>
>These queries are all fast, taking about 0.002s:
>     select * from item_info where item_id = ?;
>
>     select count(1) from users group by item_id;
>
>     select count(1) from users where item_id = ?;
>
>But when I try and join them together, they're much slower at about
>0.5s.
>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users group by item_id)
>     USING (item_id)
>     where item_id = ?;
>
>I kind of expected SQLite would figure out that the outer WHERE
>clause
>also applies to the subquery given the combination of USING and GROUP
>BY
>means it has to apply anyway.
>
>If I explicitly include the WHERE inside the subquery, the EXPLAIN
>QUERY
>PLAN is identical, but it's back to the expected fast speed (0.002s):
>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users WHERE item_id = ?)
>     USING (item_id)
>     where item_id = ?;
>
>sqlite 3.24.0
>
>Cheers,
>
>Jonathan
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Simon Slavin-3
In reply to this post by Jonathan Moules-3
On 2 Jan 2019, at 4:44pm, Jonathan Moules <[hidden email]> wrote:

>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users group by item_id)
>     USING (item_id)
>     where item_id = ?;

You have an ambiguous column name, and I don't think SQLite is doing what you think it's doing.

Please change the name of the column users.item_id to something else, then try your SELECT again.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

David Raymond
Yup, there was a typo, but you guys know what he meant. He's saying there WHERE could be pushed through the join to the subquery.


sqlite> explain query plan select * from item_info join (select item_id, count(1) from users group by item_id) using (item_id) where item_id = ?;
QUERY PLAN
|--MATERIALIZE 1
|  `--SCAN TABLE users USING COVERING INDEX user__item_id__idx
|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?)
`--SCAN SUBQUERY 1

sqlite> explain query plan select * from item_info join (select item_id, count(1) from users where item_id = ?) using (item_id) where item_id = ?;
QUERY PLAN
|--MATERIALIZE 1
|  `--SEARCH TABLE users USING COVERING INDEX user__item_id__idx (item_id=?)
|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?)
`--SCAN SUBQUERY 1


It basically comes down to not doing algebra, and proving you can use it for each case can be easy for a human, but error prone or long to do for a computer, especially for non-trivial cases. Here the WHERE clause is on one field, that field is used in an INNER join to the subquery, the sub query isn't used anywhere else in the larger query, the field that matches in the sub query is the group by target and not an aggregate field, etc. All those had to be true, and probably more things I'm not thinking of at the moment. So it becomes a not so trivial thing to identify and use safely.


-----Original Message-----
From: sqlite-users [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Wednesday, January 02, 2019 3:34 PM
To: SQLite mailing list
Subject: Re: [sqlite] Optimisation opportunity on subquery?

On 2 Jan 2019, at 4:44pm, Jonathan Moules <[hidden email]> wrote:

>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users group by item_id)
>     USING (item_id)
>     where item_id = ?;

You have an ambiguous column name, and I don't think SQLite is doing what you think it's doing.

Please change the name of the column users.item_id to something else, then try your SELECT again.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Jonathan Moules-3
In reply to this post by Simon Slavin-3
Sorry, but which column is ambiguous? The users.item_id is a foreign key
to the item_info.item_id - that's why it's a "REFERENCES" - why would I
want to change it to be something else? Isn't the convention for FK's to
have the same name across tables? That's what "USING" is for right? (or
NATURAL, but I prefer to be explicit.) Happy to be corrected.


I fixed the typo (a stray comma!), and of course, the REFERENCES table
has to be second, not first. Sorry, should have checked

CREATE TABLE item_info (

item_id TEXT PRIMARY KEY ON CONFLICT IGNORE

NOT NULL

COLLATE NOCASE,

more_data TEXT

);


CREATE TABLE users (

item_id TEXT REFERENCES item_info (item_id)

NOT NULL

COLLATE NOCASE,

some_data TEXT

);


CREATE INDEX users__item_id__idx ON users (

item_id

);


-------


In the end to get the desired result I had to invert the query by
keeping the WHERE clause inside - this one gets the full speed without
needing the WHERE clause twice:

     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users where item_id = ?)
     USING (item_id);


Anyway, just an observation.


Thanks,
Jonathan


On 2019-01-02 20:33, Simon Slavin wrote:

> On 2 Jan 2019, at 4:44pm, Jonathan Moules <[hidden email]> wrote:
>
>>      SELECT
>>          *
>>      FROM
>>          item_info
>>      JOIN (select count(1) from users group by item_id)
>>      USING (item_id)
>>      where item_id = ?;
> You have an ambiguous column name, and I don't think SQLite is doing what you think it's doing.
>
> Please change the name of the column users.item_id to something else, then try your SELECT again.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Simon Slavin-3
On 2 Jan 2019, at 9:50pm, Jonathan Moules <[hidden email]> wrote:

> Sorry, but which column is ambiguous? The users.item_id is a foreign key to the item_info.item_id - that's why it's a "REFERENCES" - why would I want to change it to be something else? Isn't the convention for FK's to have the same name across tables? That's what "USING" is for right? (or NATURAL, but I prefer to be explicit.) Happy to be corrected.

It may be that our careers developed with different ideas about how to use SQL.  You had a JOIN, both tables had a column "item_id", and a reference inside the JOIN to "item_id" would be ambiguous.  Since the SQL standard does not make it clear which table would be used, it would be possible for different implementations of SQL to think you meant different tables.

I understand that, in your example, the values would be the same.  But that doesn't explain to you what the optimizer thinks you're trying to do.  The simplest way to tell the optimizer what you need would be to rename one of the columns.  You could try both tables, see which solution was faster, and use that one.

However, I see other posters have gained better clarity for your problem.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Jonathan Moules-3
Hi Simon,

Thanks for that - I'm always trying to improve my SQL. I think I see
what you mean now.

Assuming my original query:

     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users group by item_id)
     USING (item_id)
     where item_id = ?;

There are three uses of "item_id".

* The first is the "group by", which unless my SQL is even worse than I
imagine, can only be referencing the users table.

* The second is "USING" - which is referencing both.

* The third is the one I guess you mean is ambiguous? My thinking was
that because item_id is going through the USING it wasn't ambiguous as
they're the same thing; though that's apparently not how the query
planner sees it, and hence your reference to ambiguity. That right?

So I tried using aliases (I'm assuming that removes the ambiguity), but
the query times remained at about 0.5s for both versions (whether
i.item_id or u.item_id):

     SELECT
         *
     FROM
         item_info i
     JOIN (select count(1) from users group by item_id) u
     USING (item_id)
     where u.item_id = ?;

Thanks again for clarifying, but after checking, it doesn't seem like it
was an ambiguity thing.
Cheers,
Jonathan

On 2019-01-02 22:04, Simon Slavin wrote:

> On 2 Jan 2019, at 9:50pm, Jonathan Moules <[hidden email]> wrote:
>
>> Sorry, but which column is ambiguous? The users.item_id is a foreign key to the item_info.item_id - that's why it's a "REFERENCES" - why would I want to change it to be something else? Isn't the convention for FK's to have the same name across tables? That's what "USING" is for right? (or NATURAL, but I prefer to be explicit.) Happy to be corrected.
> It may be that our careers developed with different ideas about how to use SQL.  You had a JOIN, both tables had a column "item_id", and a reference inside the JOIN to "item_id" would be ambiguous.  Since the SQL standard does not make it clear which table would be used, it would be possible for different implementations of SQL to think you meant different tables.
>
> I understand that, in your example, the values would be the same.  But that doesn't explain to you what the optimizer thinks you're trying to do.  The simplest way to tell the optimizer what you need would be to rename one of the columns.  You could try both tables, see which solution was faster, and use that one.
>
> However, I see other posters have gained better clarity for your problem.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Keith Medcalf
In reply to this post by David Raymond

However, if you want to do that then you want to use a correlated subquery as it is designed for retrieving correlated data by running a subquery per result row:

select *,
       (select count(*) from users where item_id = item_info.item_id) as count
  from item_info
 where item_id = ?;

and not a subselect in the FROM clause.  Tables in the "FROM" clause are siblings and exist independently of each other.  They are not generators ... (except in some special cases where they are, such as a table valued function).


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of David Raymond
>Sent: Wednesday, 2 January, 2019 13:56
>To: SQLite mailing list
>Subject: Re: [sqlite] Optimisation opportunity on subquery?
>
>Yup, there was a typo, but you guys know what he meant. He's saying
>there WHERE could be pushed through the join to the subquery.
>
>
>sqlite> explain query plan select * from item_info join (select
>item_id, count(1) from users group by item_id) using (item_id) where
>item_id = ?;
>QUERY PLAN
>|--MATERIALIZE 1
>|  `--SCAN TABLE users USING COVERING INDEX user__item_id__idx
>|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1
>(item_id=?)
>`--SCAN SUBQUERY 1
>
>sqlite> explain query plan select * from item_info join (select
>item_id, count(1) from users where item_id = ?) using (item_id) where
>item_id = ?;
>QUERY PLAN
>|--MATERIALIZE 1
>|  `--SEARCH TABLE users USING COVERING INDEX user__item_id__idx
>(item_id=?)
>|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1
>(item_id=?)
>`--SCAN SUBQUERY 1
>
>
>It basically comes down to not doing algebra, and proving you can use
>it for each case can be easy for a human, but error prone or long to
>do for a computer, especially for non-trivial cases. Here the WHERE
>clause is on one field, that field is used in an INNER join to the
>subquery, the sub query isn't used anywhere else in the larger query,
>the field that matches in the sub query is the group by target and
>not an aggregate field, etc. All those had to be true, and probably
>more things I'm not thinking of at the moment. So it becomes a not so
>trivial thing to identify and use safely.
>
>
>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Simon Slavin
>Sent: Wednesday, January 02, 2019 3:34 PM
>To: SQLite mailing list
>Subject: Re: [sqlite] Optimisation opportunity on subquery?
>
>On 2 Jan 2019, at 4:44pm, Jonathan Moules <jonathan-
>[hidden email]> wrote:
>
>>     SELECT
>>         *
>>     FROM
>>         item_info
>>     JOIN (select count(1) from users group by item_id)
>>     USING (item_id)
>>     where item_id = ?;
>
>You have an ambiguous column name, and I don't think SQLite is doing
>what you think it's doing.
>
>Please change the name of the column users.item_id to something else,
>then try your SELECT again.
>
>Simon.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Keith Medcalf
In reply to this post by Jonathan Moules-3

When you are executing the query:

SELECT *
  FROM item_info
  JOIN (select count(1)
          from users
      group by item_id)
 USING (item_id)
 where item_id = ?;

You are telling the SQL Database Engine (whatever it may happen to be, in this case SQLite3) that you want to take the table produced by:

select * from item_info;

and join it (an equijoin) against the table produced by running the query:

select count(1) from users group by item_id;

based on the equality of the column item_id in both tables.  If you run the two queries you will see that the second table DOES NOT produce a column called item_id.  Therefore you cannot join those tables and instead you get an error message telling you that the column item_id does not exist in both tables.

FROM tables are siblings not correlates.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.


>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Jonathan Moules
>Sent: Wednesday, 2 January, 2019 15:21
>To: [hidden email]
>Subject: Re: [sqlite] Optimisation opportunity on subquery?
>
>Hi Simon,
>
>Thanks for that - I'm always trying to improve my SQL. I think I see
>what you mean now.
>
>Assuming my original query:
>
>     SELECT
>         *
>     FROM
>         item_info
>     JOIN (select count(1) from users group by item_id)
>     USING (item_id)
>     where item_id = ?;
>
>There are three uses of "item_id".
>
>* The first is the "group by", which unless my SQL is even worse than
>I
>imagine, can only be referencing the users table.
>
>* The second is "USING" - which is referencing both.
>
>* The third is the one I guess you mean is ambiguous? My thinking was
>that because item_id is going through the USING it wasn't ambiguous
>as
>they're the same thing; though that's apparently not how the query
>planner sees it, and hence your reference to ambiguity. That right?
>
>So I tried using aliases (I'm assuming that removes the ambiguity),
>but
>the query times remained at about 0.5s for both versions (whether
>i.item_id or u.item_id):
>
>     SELECT
>         *
>     FROM
>         item_info i
>     JOIN (select count(1) from users group by item_id) u
>     USING (item_id)
>     where u.item_id = ?;
>
>Thanks again for clarifying, but after checking, it doesn't seem like
>it
>was an ambiguity thing.
>Cheers,
>Jonathan
>
>On 2019-01-02 22:04, Simon Slavin wrote:
>> On 2 Jan 2019, at 9:50pm, Jonathan Moules <jonathan-
>[hidden email]> wrote:
>>
>>> Sorry, but which column is ambiguous? The users.item_id is a
>foreign key to the item_info.item_id - that's why it's a "REFERENCES"
>- why would I want to change it to be something else? Isn't the
>convention for FK's to have the same name across tables? That's what
>"USING" is for right? (or NATURAL, but I prefer to be explicit.)
>Happy to be corrected.
>> It may be that our careers developed with different ideas about how
>to use SQL.  You had a JOIN, both tables had a column "item_id", and
>a reference inside the JOIN to "item_id" would be ambiguous.  Since
>the SQL standard does not make it clear which table would be used, it
>would be possible for different implementations of SQL to think you
>meant different tables.
>>
>> I understand that, in your example, the values would be the same.
>But that doesn't explain to you what the optimizer thinks you're
>trying to do.  The simplest way to tell the optimizer what you need
>would be to rename one of the columns.  You could try both tables,
>see which solution was faster, and use that one.
>>
>> However, I see other posters have gained better clarity for your
>problem.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Jonathan Moules-3
Gah, sorry. Another typo. I really should be more awake when I post to
this list. The non-simplified code does have the item_id on the subquery
(otherwise it simply wouldn't execute at all of course). So:

SELECT *
   FROM item_info
   JOIN (
     select
       count(1) as num,
       item_id
     from users
     group by item_id)
  USING (item_id)
  where item_id = ?;



On 2019-01-02 22:56, Keith Medcalf wrote:

> When you are executing the query:
>
> SELECT *
>    FROM item_info
>    JOIN (select count(1)
>            from users
>        group by item_id)
>   USING (item_id)
>   where item_id = ?;
>
> You are telling the SQL Database Engine (whatever it may happen to be, in this case SQLite3) that you want to take the table produced by:
>
> select * from item_info;
>
> and join it (an equijoin) against the table produced by running the query:
>
> select count(1) from users group by item_id;
>
> based on the equality of the column item_id in both tables.  If you run the two queries you will see that the second table DOES NOT produce a column called item_id.  Therefore you cannot join those tables and instead you get an error message telling you that the column item_id does not exist in both tables.
>
> FROM tables are siblings not correlates.
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.
>
>
>> -----Original Message-----
>> From: sqlite-users [mailto:sqlite-users-
>> [hidden email]] On Behalf Of Jonathan Moules
>> Sent: Wednesday, 2 January, 2019 15:21
>> To: [hidden email]
>> Subject: Re: [sqlite] Optimisation opportunity on subquery?
>>
>> Hi Simon,
>>
>> Thanks for that - I'm always trying to improve my SQL. I think I see
>> what you mean now.
>>
>> Assuming my original query:
>>
>>      SELECT
>>          *
>>      FROM
>>          item_info
>>      JOIN (select count(1) from users group by item_id)
>>      USING (item_id)
>>      where item_id = ?;
>>
>> There are three uses of "item_id".
>>
>> * The first is the "group by", which unless my SQL is even worse than
>> I
>> imagine, can only be referencing the users table.
>>
>> * The second is "USING" - which is referencing both.
>>
>> * The third is the one I guess you mean is ambiguous? My thinking was
>> that because item_id is going through the USING it wasn't ambiguous
>> as
>> they're the same thing; though that's apparently not how the query
>> planner sees it, and hence your reference to ambiguity. That right?
>>
>> So I tried using aliases (I'm assuming that removes the ambiguity),
>> but
>> the query times remained at about 0.5s for both versions (whether
>> i.item_id or u.item_id):
>>
>>      SELECT
>>          *
>>      FROM
>>          item_info i
>>      JOIN (select count(1) from users group by item_id) u
>>      USING (item_id)
>>      where u.item_id = ?;
>>
>> Thanks again for clarifying, but after checking, it doesn't seem like
>> it
>> was an ambiguity thing.
>> Cheers,
>> Jonathan
>>
>> On 2019-01-02 22:04, Simon Slavin wrote:
>>> On 2 Jan 2019, at 9:50pm, Jonathan Moules <jonathan-
>> [hidden email]> wrote:
>>>> Sorry, but which column is ambiguous? The users.item_id is a
>> foreign key to the item_info.item_id - that's why it's a "REFERENCES"
>> - why would I want to change it to be something else? Isn't the
>> convention for FK's to have the same name across tables? That's what
>> "USING" is for right? (or NATURAL, but I prefer to be explicit.)
>> Happy to be corrected.
>>> It may be that our careers developed with different ideas about how
>> to use SQL.  You had a JOIN, both tables had a column "item_id", and
>> a reference inside the JOIN to "item_id" would be ambiguous.  Since
>> the SQL standard does not make it clear which table would be used, it
>> would be possible for different implementations of SQL to think you
>> meant different tables.
>>> I understand that, in your example, the values would be the same.
>> But that doesn't explain to you what the optimizer thinks you're
>> trying to do.  The simplest way to tell the optimizer what you need
>> would be to rename one of the columns.  You could try both tables,
>> see which solution was faster, and use that one.
>>> However, I see other posters have gained better clarity for your
>> problem.
>>> Simon.
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>> users
>>
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Optimisation opportunity on subquery?

Hick Gunter
In reply to this post by Jonathan Moules-3
It would be easier to reproduce if you had checked the provided sample code for errors first...

.) Error: near ")": syntax error
.) Error: cannot join using column item_id - column not present in both tables


Your first query specifies a full table scan over the users table (via the covering index) to create an intermediate table of counts per item_id, and then selects only one row.

Your second query specifies a partial index scan to create an intermediate table of 1 row with the required item_id, which is then joined.


-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Jonathan Moules
Gesendet: Mittwoch, 02. Jänner 2019 17:44
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] [sqlite] Optimisation opportunity on subquery?

Hi List,

The below seems to my very-non-expert mind like there's scope for query-plan optimisation.

I have two tables (simplified below):

CREATE TABLE users (
     item_id   TEXT REFERENCES item_info (item_id)
                       NOT NULL
                       COLLATE NOCASE,
     some_data     TEXT,
);

CREATE INDEX users__item_id__idx ON users (
     item_id
);

CREATE TABLE item_info (
     item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
                       NOT NULL
                       COLLATE NOCASE,
     more_data     TEXT
);

There are about 1 million records in users and 100,000 records in item_info.

These queries are all fast, taking about 0.002s:
     select * from item_info where item_id = ?;

     select count(1) from users group by item_id;

     select count(1) from users where item_id = ?;

But when I try and join them together, they're much slower at about 0.5s.
     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users group by item_id)
     USING (item_id)
     where item_id = ?;

I kind of expected SQLite would figure out that the outer WHERE clause also applies to the subquery given the combination of USING and GROUP BY means it has to apply anyway.

If I explicitly include the WHERE inside the subquery, the EXPLAIN QUERY PLAN is identical, but it's back to the expected fast speed (0.002s):
     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users WHERE item_id = ?)
     USING (item_id)
     where item_id = ?;

sqlite 3.24.0

Cheers,

Jonathan



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Optimisation opportunity on subquery?

Hick Gunter
Note that you may pass different item_id values to the second query, which would then return 0 rows, but quickly. If you need to reference the same parameter in more than one location inside the SQL Statement, use explicit names or numbers

     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users WHERE item_id = ?1)
     USING (item_id)
     where item_id = ?1;


-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Hick Gunter
Gesendet: Donnerstag, 03. Jänner 2019 07:48
An: 'SQLite mailing list' <[hidden email]>
Betreff: Re: [sqlite] [EXTERNAL] Optimisation opportunity on subquery?

It would be easier to reproduce if you had checked the provided sample code for errors first...

.) Error: near ")": syntax error
.) Error: cannot join using column item_id - column not present in both tables


Your first query specifies a full table scan over the users table (via the covering index) to create an intermediate table of counts per item_id, and then selects only one row.

Your second query specifies a partial index scan to create an intermediate table of 1 row with the required item_id, which is then joined.


-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Jonathan Moules
Gesendet: Mittwoch, 02. Jänner 2019 17:44
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] [sqlite] Optimisation opportunity on subquery?

Hi List,

The below seems to my very-non-expert mind like there's scope for query-plan optimisation.

I have two tables (simplified below):

CREATE TABLE users (
     item_id   TEXT REFERENCES item_info (item_id)
                       NOT NULL
                       COLLATE NOCASE,
     some_data     TEXT,
);

CREATE INDEX users__item_id__idx ON users (
     item_id
);

CREATE TABLE item_info (
     item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
                       NOT NULL
                       COLLATE NOCASE,
     more_data     TEXT
);

There are about 1 million records in users and 100,000 records in item_info.

These queries are all fast, taking about 0.002s:
     select * from item_info where item_id = ?;

     select count(1) from users group by item_id;

     select count(1) from users where item_id = ?;

But when I try and join them together, they're much slower at about 0.5s.
     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users group by item_id)
     USING (item_id)
     where item_id = ?;

I kind of expected SQLite would figure out that the outer WHERE clause also applies to the subquery given the combination of USING and GROUP BY means it has to apply anyway.

If I explicitly include the WHERE inside the subquery, the EXPLAIN QUERY PLAN is identical, but it's back to the expected fast speed (0.002s):
     SELECT
         *
     FROM
         item_info
     JOIN (select count(1) from users WHERE item_id = ?)
     USING (item_id)
     where item_id = ?;

sqlite 3.24.0

Cheers,

Jonathan



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation opportunity on subquery?

Keith Medcalf
In reply to this post by Jonathan Moules-3

On Wednesday, 2 January, 2019 16:58, Jonathan Moules <[hidden email]> wrote:

>Gah, sorry. Another typo. I really should be more awake when I post
>to this list. The non-simplified code does have the item_id on the
>subquery (otherwise it simply wouldn't execute at all of course). So:

>SELECT *
>   FROM item_info
>   JOIN (
>     select
>       count(1) as num,
>       item_id
>     from users
>     group by item_id)
>  USING (item_id)
>  where item_id = ?;

Now perhaps we are getting somewhere.  So now what exactly is your complaint?  You have asked for the following join to occur:

JOIN this table, which you can see by the following command:
  select * from item_info
with the table that you can see by the following command:
  select count(1) as num, item_id from users group by item_id
using the common column item_id in both tables
and then to return only the row where the item_id is the specific one provided.

So, the ONLY way to solve this is to generate the table on the RHS of the join first, and then join it to the item_info table, and then return only the row you have asked for.  This produces the following plan:

CREATE TABLE users (
     item_id   TEXT REFERENCES item_info (item_id)
                       NOT NULL
                       COLLATE NOCASE,
     some_data     TEXT
);

CREATE INDEX users__item_id__idx ON users (
     item_id
);

CREATE TABLE item_info (
     item_id   TEXT PRIMARY KEY ON CONFLICT IGNORE
                       NOT NULL
                       COLLATE NOCASE,
     more_data     TEXT
);
.eqp full
SELECT *
   FROM item_info
   JOIN (
     select
       count(1) as num,
       item_id
     from users
     group by item_id)
  USING (item_id)
  where item_id = ?;
QUERY PLAN
|--MATERIALIZE 1
|  `--SCAN TABLE users USING COVERING INDEX users__item_id__idx (~1048576 rows)
|--SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?) (~1 row)
`--SCAN SUBQUERY 1 (~96 rows)
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     77    0                    00  Start at 77
1     Integer        45    1     0                    00  r[1]=45
2     Once           0     45    0                    00  materialize "subquery_1"
3     OpenEphemeral  1     2     0                    00  nColumn=2
4     Noop           3     1     0                    00
5     Integer        0     5     0                    00  r[5]=0; clear abort flag
6     Null           0     8     8                    00  r[8..8]=NULL
7     Gosub          7     42    0                    00
8     OpenRead       4     3     0     k(2,NOCASE,)   00  root=3 iDb=0; users__item_id__idx
9     ColumnsUsed    4     0     0     1              00
10    Explain        10    0     0     SCAN TABLE users USING COVERING INDEX users__item_id__idx (~1048576 rows)  00
11    Noop           0     0     0                    00  Begin WHERE-loop0: users
12    Rewind         4     28    10    0              00
13      Noop           0     0     0                    00  Begin WHERE-core
14      Column         4     0     9                    00  r[9]=users.item_id
15      Compare        8     9     1     k(1,NOCASE)    00  r[8] <-> r[9]
16      Jump           17    21    17                   00
17      Move           9     8     1                    00  r[8]=r[9]
18      Gosub          6     33    0                    00  output one row
19      IfPos          5     45    0                    00  if r[5]>0 then r[5]-=0, goto 45; check abort flag
20      Gosub          7     42    0                    00  reset accumulator
21      Integer        1     10    0                    00  r[10]=1
22      AggStep        0     10    2     count(1)       01  accum=r[2] step(r[10])
23      If             4     25    0                    00
24      Column         4     0     3                    00  r[3]=users.item_id
25      Integer        1     4     0                    00  r[4]=1; indicate data in accumulator
26      Noop           0     0     0                    00  End WHERE-core
27    Next           4     13    0                    01
28    Noop           0     0     0                    00  End WHERE-loop0: users
29    Gosub          6     33    0                    00  output final row
30    Goto           0     45    0                    00
31    Integer        1     5     0                    00  r[5]=1; set abort flag
32    Return         6     0     0                    00
33    IfPos          4     35    0                    00  if r[4]>0 then r[4]-=0, goto 35; Groupby result generator entry point
34    Return         6     0     0                    00
35    AggFinal       2     1     0     count(1)       00  accum=r[2] N=1
36    SCopy          2     11    0                    00  r[11]=r[2]
37    SCopy          3     12    0                    00  r[12]=r[3]
38    MakeRecord     11    2     10                   00  r[10]=mkrec(r[11..12])
39    NewRowid       1     13    0                    00  r[13]=rowid
40    Insert         1     10    13                   08  intkey=r[13] data=r[10]
41    Return         6     0     0                    00  end groupby result generator
42    Null           0     2     3                    00  r[2..3]=NULL
43    Integer        0     4     0                    00  r[4]=0; indicate accumulator empty
44    Return         7     0     0                    00
45    Return         1     0     0                    00  end subquery_1
46    OpenRead       0     4     0     2              02  root=4 iDb=0; item_info
47    ColumnsUsed    0     0     0     3              00
48    OpenRead       5     5     0     k(1,NOCASE)    02  root=5 iDb=0; sqlite_autoindex_item_info_1
49    ColumnsUsed    5     0     0     1              00
50    Explain        50    0     0     SEARCH TABLE item_info USING INDEX sqlite_autoindex_item_info_1 (item_id=?) (~1 row)  00
51    Noop           0     0     0                    00  Begin WHERE-loop0: item_info
52    CursorHint     5     0     0     EQ(c0,expr)    00
53    Variable       1     14    0                    00  r[14]=parameter(1,)
54    IsNull         14    75    0                    00  if r[14]==NULL goto 75
55    Affinity       14    1     0     B              00  affinity(r[14])
56    SeekGE         5     75    14    1              00  key=r[14]
57    IdxGT          5     75    14    1              00  key=r[14]
58    DeferredSeek   5     0     0                    00  Move 0 to 5.rowid if needed
59    Explain        59    0     0     SCAN SUBQUERY 1 (~96 rows)  00
60    Noop           0     0     0                    00  Begin WHERE-loop1: subquery_1
61    Column         5     0     15                   00  r[15]=item_info.item_id
62    CursorHint     1     0     0     EQ(r[15],c1)   00
63    Rewind         1     75    0                    00
64      Column         5     0     16                   00  r[16]=item_info.item_id
65      Column         1     1     17                   00  r[17]=subquery_1.item_id
66      Ne             17    73    16    (NOCASE)       51  if r[16]!=r[17] goto 73
67      Noop           0     0     0                    00  Begin WHERE-core
68      Column         5     0     18                   00  r[18]=item_info.item_id
69      Column         0     1     19                   00  r[19]=item_info.more_data
70      Column         1     0     20                   00  r[20]=subquery_1.num
71      ResultRow      18    3     0                    00  output=r[18..20]
72      Noop           0     0     0                    00  End WHERE-core
73    Next           1     64    0                    01
74    Noop           0     0     0                    00  End WHERE-loop1: subquery_1
75    Noop           0     0     0                    00  End WHERE-loop0: item_info
76    Halt           0     0     0                    00
77    Transaction    0     0     3     0              01  usesStmtJournal=0
78    Goto           0     1     0                    00

Which does exactly what you asked in the most efficient manner possible.

What is it that you think it ought to be doing instead?

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.





_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Optimisation opportunity on subquery?

Hick Gunter
>-----Ursprüngliche Nachricht-----
>Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Keith Medcalf
>On Wednesday, 2 January, 2019 16:58, Jonathan Moules <[hidden email]<mailto:[hidden email]>> wrote:

>

>>Gah, sorry. Another typo. I really should be more awake when I post to

>>this list. The non-simplified code does have the item_id on the

>>subquery (otherwise it simply wouldn't execute at all of course). So:

>>

>>SELECT *

>>   FROM item_info

>>   JOIN (

>>     select

>>       count(1) as num,

>>       item_id

>>     from users

>>     group by item_id)

>>  USING (item_id)

>>  where item_id = ?;

>

>Now perhaps we are getting somewhere.  So now what exactly is your complaint?

>



He first asked for (1) "compute all the totals and then select the one matching the item I provide" (table JOIN covering index full scan)



He later asked for (2) "compute the total for item#1 and then select it if it matches item#2" (table JOIN partial index scan), which is faster and returns the correct value only if item#1 == item#2 at run time.


He wanted (3) "select the item I provide and then compute a total for it" (select correlated subquery)



His complaint is that SQLite rightly declines to transform (1) into (3), instead of doing what he asked for and not what he wanted.





___________________________________________
Gunter Hick | Software Engineer | Scientific Games International GmbH<http://www.scigames.at> | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users