Hints for the query planner

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Hints for the query planner

Richard Hipp-3
There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic "unlikely()" function.  Subexpressions contained within
"unlikely()" are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string "bach" seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Philip Bennefall
Hi Richard,

What about "probability" or "likelyhood"? This works in both the case where
the likelyhood is great as well as when it is low. From the list you
provided, I would pick "unlikely".

Kind regards,

Philip Bennefall
----- Original Message -----
From: "Richard Hipp" <[hidden email]>
To: "General Discussion of SQLite Database" <[hidden email]>
Sent: Tuesday, September 10, 2013 9:26 PM
Subject: [sqlite] Hints for the query planner


There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic "unlikely()" function.  Subexpressions contained within
"unlikely()" are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string "bach" seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Marc L. Allen
In reply to this post by Richard Hipp-3
As I was reading this, I said to myself, "what they really need is a confidence value."  Then I read the end and, there it was!  A confidence value.  Ok.. not exactly confidence, but I think you get my meaning.

It seems to me that you're allowing the query writer to substitute personal knowledge of the DB for knowledge based on ANALYZE or other statistical indexes.  So, I'm all in favor of allowing that second argument.

If so, I would suggest "confidence(exp, confidence_value)".  Or, perhaps, "likelihood(..)"  Likely is fine, or you might even establish several names with built-in defaults... e.g. "likely(xxx)" might be "confidence(xxx, .75)" and "unlikely(xxx)" might be "confidence(xxx, .25)"  You've got "rarely," "mostly," and a whole suite of other synonyms.




This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

James Berry
In reply to this post by Richard Hipp-3

On Sep 10, 2013, at 12:26 PM, Richard Hipp <[hidden email]> wrote:

> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.


I like the optional second parameter. Apart from the obvious change to likelihood, which somebody else suggested, but which is less self documenting in the one-argument case, I'd suggest that you actually add two words: likely and unlikely, and whose second parameters are the inverse of each other, crossing at 0.5. So an expression could then, in the simplest case, be labeled LIKELY or UNLIKELY, with second parameter defaulting to 1.0 (or to whatever value you feel is appropriate), but allowing the user to specify a lesser likelihood by lowering that value. LIKELY(expr, 0) would mean the same as UNLIKELY(expr) and UNLIKELY(expr, 1), and UNLIKELY(expr, 0) would be the same as LIKELY(expr, 1), etc.

James
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Roman Fleysher
In reply to this post by Richard Hipp-3
In Bayesian statistics there is a term "prior", prior probability.


________________________________________
From: [hidden email] [[hidden email]] on behalf of Richard Hipp [[hidden email]]
Sent: Tuesday, September 10, 2013 3:26 PM
To: General Discussion of SQLite Database
Subject: [sqlite] Hints for the query planner

There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic "unlikely()" function.  Subexpressions contained within
"unlikely()" are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string "bach" seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Tim Streater-3
In reply to this post by Richard Hipp-3
On 10 Sep 2013 at 20:26, Richard Hipp <[hidden email]> wrote:

> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.
likelihood (EXPR, value)




--
Cheers  --  Tim

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Eric Minbiole-2
In reply to this post by Richard Hipp-3
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
>
I quite like (2) "selective".  I think it's reasonably descriptive on its
own, and also works well with the optional second argument.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Scott Robison-2
I think I prefer something along the lines of "unlikely" or "likely". The
problem with a term like "selective" (at least in my brain) is that it
doesn't imply (for the single argument version) in what way it is being
selective.

If a negative form of the magic function is used ("unlikely", "seldom",
etc) I would suggest considering inverting the optional second parameter.
In other words, 0.05 would become 0.95. In my opinion, that reads better:
"unlikely(COLUMN LIKE '%pattern%', 0.95)" reads "it is unlikely the
expression will be true 95% of the time".

In like fashion, a positive form of the magic function would keep the
current meaning of the optional second parameter.

Just ideas, I'm not married to them. Thanks for the discussion.

SDR


On Tue, Sep 10, 2013 at 4:17 PM, Eric Minbiole <[hidden email]> wrote:

> > (1)  unlikely(EXPR)
> > (2)  selective(EXPR)
> > (3)  seldom(EXPR)
> > (4)  seldom_true(EXPR)
> > (5)  usually_not_true(EXPR)
> >
> >
> I quite like (2) "selective".  I think it's reasonably descriptive on its
> own, and also works well with the optional second argument.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Simon Slavin-3
In reply to this post by Richard Hipp-3

On 10 Sep 2013, at 10:48pm, Tim Streater <[hidden email]> wrote:

> likelihood (EXPR, value)

Best I've seen so far.  I know it makes no sense without the second parameter but I think if you're going to make use of a special non-standard optimisation system you can be expected to know exactly what it means.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Stephan Beal-3
In reply to this post by Richard Hipp-3
Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05)

(sent from a mobile device - please excuse brevity, typos, and top-posting)
----- stephan beal
http://wanderinghorse.net
On Sep 10, 2013 11:49 PM, "Tim Streater" <[hidden email]> wrote:

> On 10 Sep 2013 at 20:26, Richard Hipp <[hidden email]> wrote:
>
> > SURVEY QUESTION:
> >
> > The question for today is what to call this magic hint function:
> >
> > (1)  unlikely(EXPR)
> > (2)  selective(EXPR)
> > (3)  seldom(EXPR)
> > (4)  seldom_true(EXPR)
> > (5)  usually_not_true(EXPR)
> >
> > Please feel free to suggest other names if you think of any.
>
> likelihood (EXPR, value)
>
>
>
>
> --
> Cheers  --  Tim
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

David de Regt
In reply to this post by Simon Slavin-3
Seconded.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Simon Slavin
Sent: Tuesday, September 10, 2013 3:46 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Hints for the query planner


On 10 Sep 2013, at 10:48pm, Tim Streater <[hidden email]> wrote:

> likelihood (EXPR, value)

Best I've seen so far.  I know it makes no sense without the second parameter but I think if you're going to make use of a special non-standard optimisation system you can be expected to know exactly what it means.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Stephan Beal-3
In reply to this post by Stephan Beal-3
Or...

unlikely(expr (, prob=0.05)) ==> likelihood (...)

(sent from a mobile device - please excuse brevity, typos, and top-posting)
----- stephan beal
http://wanderinghorse.net
On Sep 11, 2013 12:51 AM, "Stephan Beal" <[hidden email]> wrote:

> Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05)
>
> (sent from a mobile device - please excuse brevity, typos, and
> top-posting)
> ----- stephan beal
> http://wanderinghorse.net
> On Sep 10, 2013 11:49 PM, "Tim Streater" <[hidden email]> wrote:
>
>> On 10 Sep 2013 at 20:26, Richard Hipp <[hidden email]> wrote:
>>
>> > SURVEY QUESTION:
>> >
>> > The question for today is what to call this magic hint function:
>> >
>> > (1)  unlikely(EXPR)
>> > (2)  selective(EXPR)
>> > (3)  seldom(EXPR)
>> > (4)  seldom_true(EXPR)
>> > (5)  usually_not_true(EXPR)
>> >
>> > Please feel free to suggest other names if you think of any.
>>
>> likelihood (EXPR, value)
>>
>>
>>
>>
>> --
>> Cheers  --  Tim
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Keith Medcalf
In reply to this post by Stephan Beal-3

How about three, where two are just overloaded, or rather syntactic sugar, for the main declaration:

unlikely(expr)
likely(expr)
likelihood(expr, rate)

Where unlikely(expr) -> likelihood(expr, 0.05)
      likely(expr)   -> likelihood(expr, 0.95)

I would presume that the rate is the expected selection rate of expr over the entire table, so the correctly computed value of the rate for a table A of a given expr, for example:

select * from A where col LIKE '%bach%'

would be

select * from A where likelihood(col like '%bach%', (select count(*) from a where col like '%bach%')/(select count(*) from a))
and is not related to the application of other conditionals not included in the expr itself.

So, if there are usable statistics available, should the likelihood given be ignored; or, should likelihood completely override the statistical input to the optimizer?  I vote that the hint should only be used if no other statistical data is available.

> Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05)
>
> (sent from a mobile device - please excuse brevity, typos, and top-
> posting)
> ----- stephan beal
> http://wanderinghorse.net
> On Sep 10, 2013 11:49 PM, "Tim Streater" <[hidden email]> wrote:
>
> > On 10 Sep 2013 at 20:26, Richard Hipp <[hidden email]> wrote:
> >
> > > SURVEY QUESTION:
> > >
> > > The question for today is what to call this magic hint function:
> > >
> > > (1)  unlikely(EXPR)
> > > (2)  selective(EXPR)
> > > (3)  seldom(EXPR)
> > > (4)  seldom_true(EXPR)
> > > (5)  usually_not_true(EXPR)
> > >
> > > Please feel free to suggest other names if you think of any.
> >
> > likelihood (EXPR, value)
> >
> >
> >
> >
> > --
> > Cheers  --  Tim
> >
> > _______________________________________________
> > sqlite-users mailing list
> > [hidden email]
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> >
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Constantine Yannakopoulos
In reply to this post by Richard Hipp-3
Hello Dr Hipp,

First of all, I apologize for this rather off-topic suggestion knowing that
you may have already implemented the syntax you describe, but there is an
IMHO good reason for it, read ahead.

On Tue, Sep 10, 2013 at 10:26 PM, Richard Hipp <[hidden email]> wrote:

> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%')
>    AND composer.cid=track.cid
>    AND album.aid=track.aid;
>

I would prefer that the planner hint is not interleaved inside normal SQL
syntax. Instead I propose a special comment-like syntax instead, as
Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">":

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
> /*> unlikely */
>  AND composer.cid=track.cid AND album.aid=track.aid;
>

or:

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
> --> unlikely
>    AND composer.cid=track.cid
>    AND album.aid=track.aid;


If the hint is to be applied to an expression that combines many column
predicates with AND (I am not sure if this actually makes sense):

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%'
>    AND composer.cid=track.cid)
>    AND album.aid=track.aid;
>

then a -normally redundant- pair of parentheses can be used to specify the
scope of the hint:

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /*> unlikely */
>    AND album.aid=track.aid;
>

The SQLite SQL parser will have to look for exactly "/*>" or "-->" without
whitespace between the characters, so it can easily tell a planner hint
from a plain comment with a single character read-ahead. Also, the fact
that hints are "transparent" to the SQL syntax will allow the query parser
to handle them in an "orthogonal" way (e.g. a small separate parser for
hints) to normal SQL parsing, IMO making handling of any future hints
easier to add.

The main reason for this proposal is that the planner hint will be ignored
by default by other SQL parsers without the need to modify them, which in
some cases may not even be possible. For instance it will allow someone to
write SQL that is valid in databases of alternative DB vendors and still
provide planner hints when the DB vendor is SQLite (that is why I replaced
"+" with ">", to avoid conflicts with a hypothetical alternate Oracle query
optimizer) without having to modify the SQL in the application code to
remove the hints. This is a property of the Oracle optimizer hint syntax I
have always appreciated when writing SQL that is to be executed in
databases of alternative DB vendors with the same schema, for applications
where the user chooses the database vendor from a list of supported ones.

For more on Oracle optimizer hints see
http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm.

As for the name of the hint itself I would propose:

--> PROBABLY(True) -- the current default
--> PROBABLY(False)
--> PROBABLY(False, 0.7)
--> PROBABLY(False, 0.6, 0.3)  --re "pedantic detail", the second value if
for True, the remainder for NULL.

Kind regards,

Constantine Yannakopoulos
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Tony Papadimitriou
In reply to this post by Richard Hipp-3
How about:

maybe(COLUMN LIKE '%pattern%',.95)

or (as percent using integer value in 0..100)

maybe(COLUMN LIKE '%pattern%',95)

with a default value of (possibly) 50% (or .5) for the optional second arg?

-----Original Message-----
From: Richard Hipp
Sent: Tuesday, September 10, 2013 10:26 PM
To: General Discussion of SQLite Database
Subject: [sqlite] Hints for the query planner

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Hick Gunter
In reply to this post by Richard Hipp-3
(6) maybe(EXPR)

-----Ursprüngliche Nachricht-----
Von: Richard Hipp [mailto:[hidden email]]
Gesendet: Dienstag, 10. September 2013 21:27
An: General Discussion of SQLite Database
Betreff: [sqlite] Hints for the query planner

There is a survey question at the bottom of this message.  But first some context...

Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner.  The query under discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it to do a better job.  In particular, the query planner does not know how often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true.  Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true.  Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using a magic "unlikely()" function.  Subexpressions contained within "unlikely()" are assumed to usually not be true.  Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true.  So, if an application developer knows that the string "bach" seldom occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different query plan that works better when the subexpression is commonly false.  Or it might decide that the original query plan was good enough and ignore the hint.  The query planner gets to make that decision.  The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite.  But I want a more general solution.  Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything.  A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards.  I don't know if this matters.  The optional second argument is not guaranteed to make it into an actually release.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--------------------------------------------------------------------------
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: [hidden email]

This e-mail is confidential and may well also be legally privileged. If you have received it in error, you are on notice as to its status and accordingly please notify us immediately by reply e-mail and then delete this message from your system. Please do not copy it or use it for any purposes, or disclose its contents to any person as to do so could be a breach of confidence. Thank you for your cooperation.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Baruch Burstein-2
In reply to this post by Constantine Yannakopoulos
I also think it should not be directly in the SQL. I like the
not-really-a-comment syntax. Another option might be a few PRAGMAs,
something like

PRAGMA hint("table1.col1 IN (1,2,5)", 0.05);
PRAGMA hint("table1.col2 LIKE '%bach%'". 0.4);

these would add the hints to an internal table. When preparing a query, the
planner would check the hints table to see if any hints match the
table/column/condition triplet, and if so optionally use the hint. Removing
a hint and removing all hints would also be a couple of PRAGMAs.


On Wed, Sep 11, 2013 at 3:53 AM, kyan <[hidden email]> wrote:

> Hello Dr Hipp,
>
> First of all, I apologize for this rather off-topic suggestion knowing that
> you may have already implemented the syntax you describe, but there is an
> IMHO good reason for it, read ahead.
>
> On Tue, Sep 10, 2013 at 10:26 PM, Richard Hipp <[hidden email]> wrote:
>
> > SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE unlikely(cname LIKE '%bach%')
> >    AND composer.cid=track.cid
> >    AND album.aid=track.aid;
> >
>
> I would prefer that the planner hint is not interleaved inside normal SQL
> syntax. Instead I propose a special comment-like syntax instead, as
> Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">":
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE cname LIKE '%bach%'
> > /*> unlikely */
> >  AND composer.cid=track.cid AND album.aid=track.aid;
> >
>
> or:
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE cname LIKE '%bach%'
> > --> unlikely
> >    AND composer.cid=track.cid
> >    AND album.aid=track.aid;
>
>
> If the hint is to be applied to an expression that combines many column
> predicates with AND (I am not sure if this actually makes sense):
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE unlikely(cname LIKE '%bach%'
> >    AND composer.cid=track.cid)
> >    AND album.aid=track.aid;
> >
>
> then a -normally redundant- pair of parentheses can be used to specify the
> scope of the hint:
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /*> unlikely */
> >    AND album.aid=track.aid;
> >
>
> The SQLite SQL parser will have to look for exactly "/*>" or "-->" without
> whitespace between the characters, so it can easily tell a planner hint
> from a plain comment with a single character read-ahead. Also, the fact
> that hints are "transparent" to the SQL syntax will allow the query parser
> to handle them in an "orthogonal" way (e.g. a small separate parser for
> hints) to normal SQL parsing, IMO making handling of any future hints
> easier to add.
>
> The main reason for this proposal is that the planner hint will be ignored
> by default by other SQL parsers without the need to modify them, which in
> some cases may not even be possible. For instance it will allow someone to
> write SQL that is valid in databases of alternative DB vendors and still
> provide planner hints when the DB vendor is SQLite (that is why I replaced
> "+" with ">", to avoid conflicts with a hypothetical alternate Oracle query
> optimizer) without having to modify the SQL in the application code to
> remove the hints. This is a property of the Oracle optimizer hint syntax I
> have always appreciated when writing SQL that is to be executed in
> databases of alternative DB vendors with the same schema, for applications
> where the user chooses the database vendor from a list of supported ones.
>
> For more on Oracle optimizer hints see
> http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm.
>
> As for the name of the hint itself I would propose:
>
> --> PROBABLY(True) -- the current default
> --> PROBABLY(False)
> --> PROBABLY(False, 0.7)
> --> PROBABLY(False, 0.6, 0.3)  --re "pedantic detail", the second value if
> for True, the remainder for NULL.
>
> Kind regards,
>
> Constantine Yannakopoulos
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



--
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Konrad Hambrick
In reply to this post by Richard Hipp-3
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Richard Hipp
> Sent: Tuesday, September 10, 2013 2:27 PM
> To: General Discussion of SQLite Database
> Subject: [sqlite] Hints for the query planner
>
> There is a survey question at the bottom of this message.  But first some
> context...
>
<snip>

>
> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.
>
> ADDITIONAL INFORMATION:
>
> The current implementation allows a second argument which must be a
> floating point constant between 0.0 and 1.0, inclusive. The second argument
> is an estimate of the probability that the expression in the first argument
> will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
> well when this probability is small, but if the second argument is close to
> 1.0, then those names seem backwards.  I don't know if this matters.  The
> optional second argument is not guaranteed to make it into an actually
> release.

All --

Since the optional second arg is not guaranteed to make it into a release,
I like (3) - SELDOM( EXPR ) ...

----------------------- cut  here --------------------------
SELECT DISTINCT aname
  FROM album, composer, track
 WHERE SELDOM( cname LIKE '%bach%' )
   AND composer.cid=track.cid
   AND album.aid=track.aid
;
----------------------- cut there --------------------------

-- kjh

_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Ralf Junker
In reply to this post by Richard Hipp-3
I suggest a verb to express what the function is actually doing, namely
to reduce its argument in rank or degree for the query planner:

DEGRADE

1. to reduce in worth, character, etc; disgrace;
2. to reduce in rank, status, or degree; remove from office;
3. to reduce in strength, quality, intensity, etc

Source: http://www.collinsdictionary.com/dictionary/english/degrade

On 10.09.2013 21:26, Richard Hipp wrote:

> Please feel free to suggest other names if you think of any.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Hints for the query planner

Ralf Junker
On 11.09.2013 16:07, Ryan Johnson wrote:

> Perhaps you meant "demote" rather than "degrade" ? That would be a
> better fit (an external action that does not necessarily make the
> object worse or less useful), and less vague, but it still carries a
> negative connotation.

"demote" sounds fine to me, especially since its antonym "promote" may
be used for a function name to raise an expression's rank for the query
planner rather than the 2nd argument.

The negative connotation of both "degrade" and "demote" does not feel
bad for me as a non native English speaker. Both, however, express an
action rather than a quality which is more telling to me than "unlikely"
or the other adjectives suggested so far.

Maybe the function name could be prefixed by "qp_" (for query planner)
or similar to clarify their functionality even more: "qp_demote" and
"qp_promote"?

Ralf
_______________________________________________
sqlite-users mailing list
[hidden email]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
12