Unexpected optimization

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected optimization

Max Vlasov
Hi,

I noticed an unexpected optimization at the sqlite side.
Currently I can not reproduce this with some arbitrary test data (probably
I will eventually). Anyway the logic behind this (pseudo-code query)

Select .... , (Select count(*) from LookUpTable where
LookUpTable.Value=TableValue) as StatCount from
(
  ... Select TableValue, ... left join  ... left join
  where <InnerCondition>
)
   where StatCount  = ..

The aggregate lookup (Select count()) is relatively expensive to perform
and involves a virtual table on my side (LookUpTable). So the goal of
<InnerCondition> is also to narrow the output of the data for this lookup.
Most of the time (including my synthetic tests) the filter indeed works the
expected way (Filtering with <InnerCondition> then performing the aggregate
only for the suitable), but for some of queries where there are several
joins sqlite starts to perform the lookup before applying <InnerCondition>
so I get my expensive calculations used for all rows of the inner joined
table and then filtering with <InnerCondition>. I double checked this since
the LookUpTable is my virtual table so I can set a breakpoint and inspect
the passed value. Ironically, when I remove the outer condition ( where
StatCount ..  ) from the query in question, it starts to work much faster.

I suspect this might be related to how I respond to the constraint cost
requests from sqlite. For this virtual table the possible results might be
1 or a very big value. I see that the value 1 is indeed visited for this
query and probably sqlite might assume some absolute minimum cost for this
look-up. But when I change it to a bigger value (still lower than "a very
big value" also used), the query plan will not change.

Here are summary of Explain Query Plan (rea is my virtual table, there are
4 joins in this query, sqlite 3.21.0).

The Query with Outer condition "where StatCount  = .."
SCAN TABLE
SEARCH TABLE (2 times)
EXECUTE CORRELATED SCALAR SUBQUERY 1
SCAN TABLE Rea VIRTUAL TABLE INDEX 2:
SEARCH TABLE (2 times)
EXECUTE CORRELATED SCALAR SUBQUERY 2
SCAN TABLE Rea VIRTUAL TABLE INDEX 2:

The same query when I just removed the outer "where StatCount  = .."
SCAN TABLE...
SEARCH TABLE (4 times)
EXECUTE CORRELATED SCALAR SUBQUERY 1
SCAN TABLE Rea VIRTUAL TABLE INDEX 2:


Can I manually affect the plan for this query or probably by further
tweaking the virtual table costs?

Thanks



Thanks
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Unexpected optimization

Hick Gunter
SQLite uses the result of ANALYZE for computing costs of native tables and your reported costs for queries from virtual tables. You are expected to return estimated disk access counts for the constraints given (plus more detail, depending on SQLite version).

Are you sure you need LEFT joins as opposed to inner joins?

You can also use CROSS JOIN to force the QP to not reorder the LHS and RHS tables

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Max Vlasov
Gesendet: Donnerstag, 22. März 2018 13:44
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] [sqlite] Unexpected optimization

Hi,

I noticed an unexpected optimization at the sqlite side.
Currently I can not reproduce this with some arbitrary test data (probably I will eventually). Anyway the logic behind this (pseudo-code query)

Select .... , (Select count(*) from LookUpTable where
LookUpTable.Value=TableValue) as StatCount from (
  ... Select TableValue, ... left join  ... left join
  where <InnerCondition>
)
   where StatCount  = ..

The aggregate lookup (Select count()) is relatively expensive to perform and involves a virtual table on my side (LookUpTable). So the goal of <InnerCondition> is also to narrow the output of the data for this lookup.
Most of the time (including my synthetic tests) the filter indeed works the expected way (Filtering with <InnerCondition> then performing the aggregate only for the suitable), but for some of queries where there are several joins sqlite starts to perform the lookup before applying <InnerCondition> so I get my expensive calculations used for all rows of the inner joined table and then filtering with <InnerCondition>. I double checked this since the LookUpTable is my virtual table so I can set a breakpoint and inspect the passed value. Ironically, when I remove the outer condition ( where StatCount ..  ) from the query in question, it starts to work much faster.

I suspect this might be related to how I respond to the constraint cost requests from sqlite. For this virtual table the possible results might be
1 or a very big value. I see that the value 1 is indeed visited for this query and probably sqlite might assume some absolute minimum cost for this look-up. But when I change it to a bigger value (still lower than "a very big value" also used), the query plan will not change.

Here are summary of Explain Query Plan (rea is my virtual table, there are
4 joins in this query, sqlite 3.21.0).

The Query with Outer condition "where StatCount  = .."
SCAN TABLE
SEARCH TABLE (2 times)
EXECUTE CORRELATED SCALAR SUBQUERY 1
SCAN TABLE Rea VIRTUAL TABLE INDEX 2:
SEARCH TABLE (2 times)
EXECUTE CORRELATED SCALAR SUBQUERY 2
SCAN TABLE Rea VIRTUAL TABLE INDEX 2:

The same query when I just removed the outer "where StatCount  = .."
SCAN TABLE...
SEARCH TABLE (4 times)
EXECUTE CORRELATED SCALAR SUBQUERY 1
SCAN TABLE Rea VIRTUAL TABLE INDEX 2:


Can I manually affect the plan for this query or probably by further tweaking the virtual table costs?

Thanks



Thanks
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected optimization

Richard Hipp-3
In reply to this post by Max Vlasov
Max,

Since you appear to be writing your own virtual tables, you could
probably benefit from becoming more familiar with the internal
workings of SQLite, and especially the ".wheretrace" and
".selecttrace" commands of the CLI.  To enable those commands, build a
new copy of the CLI that includes -DSQLITE_ENABLE_SELECTTRACE and
-DSQLITE_ENABLE_WHERETRACE.

The command ".selecttrace 0xffff" will print out the parse tree of
your SQL statement at various points as it is transformed by the query
optimizer.  By viewing this parse tree, you might be better able to
understand the transformations that are taking place, which might give
additional insights into what it going astray for you.

The latest pre-release snapshot for 3.23.0 contains a couple of new
optimizations related to LEFT JOIN.  Please also try your code with
the pre-release snapshot to see if it helps or hurts or makes no
difference.  And please report back what you find, regardless of your
findings.

The ".wheretrace 0xfff" command prints out the steps used by the query
planner as it scores various execution strategies looking for the
fastest way to run your query.  It should clearly show the costs
returned by your xBestIndex implementation, how decisions are made
based on those costs, and help you to see how changing those costs
might result in better plans.

The outputs from .selecttrace and .wheretrace are undocumented.  They
are subject to change.  And they do change in incompatible ways from
time to time.  You'll need to look at the source code to fully
understand what the outputs mean.

Note that the cost numbers printed by .wheretrace are logarithmic.
See the description of the LogEst numbers at
https://www.sqlite.org/src/artifact?ln=755-778&name=7e9deb145c110289
for additional information. There is a simple command-line program at
https://www.sqlite.org/src/artifact/11346aa019e2e77a that you can
compile and use to convert values between LogEst and traditional
base-10 decimal numbers.

There is older debugging documentation at
https://www.sqlite.org/debugging.html that might give additional
hints.

As you work through this problem, please provide feedback so that we
can improve the documentation for the next person who faces similar
issues.

On 3/22/18, Max Vlasov <[hidden email]> wrote:

> Hi,
>
> I noticed an unexpected optimization at the sqlite side.
> Currently I can not reproduce this with some arbitrary test data (probably
> I will eventually). Anyway the logic behind this (pseudo-code query)
>
> Select .... , (Select count(*) from LookUpTable where
> LookUpTable.Value=TableValue) as StatCount from
> (
>   ... Select TableValue, ... left join  ... left join
>   where <InnerCondition>
> )
>    where StatCount  = ..
>
> The aggregate lookup (Select count()) is relatively expensive to perform
> and involves a virtual table on my side (LookUpTable). So the goal of
> <InnerCondition> is also to narrow the output of the data for this lookup.
> Most of the time (including my synthetic tests) the filter indeed works the
> expected way (Filtering with <InnerCondition> then performing the aggregate
> only for the suitable), but for some of queries where there are several
> joins sqlite starts to perform the lookup before applying <InnerCondition>
> so I get my expensive calculations used for all rows of the inner joined
> table and then filtering with <InnerCondition>. I double checked this since
> the LookUpTable is my virtual table so I can set a breakpoint and inspect
> the passed value. Ironically, when I remove the outer condition ( where
> StatCount ..  ) from the query in question, it starts to work much faster.
>
> I suspect this might be related to how I respond to the constraint cost
> requests from sqlite. For this virtual table the possible results might be
> 1 or a very big value. I see that the value 1 is indeed visited for this
> query and probably sqlite might assume some absolute minimum cost for this
> look-up. But when I change it to a bigger value (still lower than "a very
> big value" also used), the query plan will not change.
>
> Here are summary of Explain Query Plan (rea is my virtual table, there are
> 4 joins in this query, sqlite 3.21.0).
>
> The Query with Outer condition "where StatCount  = .."
> SCAN TABLE
> SEARCH TABLE (2 times)
> EXECUTE CORRELATED SCALAR SUBQUERY 1
> SCAN TABLE Rea VIRTUAL TABLE INDEX 2:
> SEARCH TABLE (2 times)
> EXECUTE CORRELATED SCALAR SUBQUERY 2
> SCAN TABLE Rea VIRTUAL TABLE INDEX 2:
>
> The same query when I just removed the outer "where StatCount  = .."
> SCAN TABLE...
> SEARCH TABLE (4 times)
> EXECUTE CORRELATED SCALAR SUBQUERY 1
> SCAN TABLE Rea VIRTUAL TABLE INDEX 2:
>
>
> Can I manually affect the plan for this query or probably by further
> tweaking the virtual table costs?
>
> Thanks
>
>
>
> Thanks
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users