Quantcast

All versions compatible "very big" estimatedCost (Virtual tables)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

All versions compatible "very big" estimatedCost (Virtual tables)

Max Vlasov
Hi,

I sometimes use virtual tables to implement some kind of one-to-many
output. One of examples mentioned previously was the comma list virtual
table when a field containing comma-separated values might be used to
output rows of values from this list. Other example - performing
regexp-like query against a text and outputting the result columns. But
this involves assuming that some columns of the virtual table are required
"input" parameters while the other is "output".  For example, for comma
list, the required parameter is the list, the output - extracted values.
The consequence of this trick is that this virtual table is not fully
functional one, so you can't query select * from it, it's useful only by
providing where or "join .. on" clause containing the required "input"
parameters.

I usually encouraged sqlite to provide all required input parameters by
reporting a "cheap" value of estimatedCost when I recognized my "input"
parameters in xBestIndex call and providing an "expensive" values for any
other cases. This usually worked for simple and complex cases until
3.8.0.0. It seems the version introduced Next-Generation Query Planner and
I noticed that in some complex cases my cheap-expensive recommendations
were ignored, so even when I noticed in the debugger that cheap value was
provided for the right index and expensive for all other cases, the
following xFilter provided not all values required probably deciding it
knew better :). Before this I used cheap value 1 and expensive value 10000.
Testing (3.17.0) revealed that in some cases multiplying expensive by x100
helped, but other cases required increasing this value even more.

So, what is the maximum reasonable value of estimatedCost that will not
turn sqlite into possible overflow errors while telling at the same time
that I consider some variant very, very expensive? Or maybe changing cheap
from 1 to 0 will do the trick?

Thanks

Max
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: All versions compatible "very big" estimatedCost (Virtual tables)

Dominique Devienne
On Tue, Mar 28, 2017 at 10:26 AM, Max Vlasov <[hidden email]> wrote:

> I sometimes use virtual tables to implement some kind of one-to-many
> output. One of examples mentioned previously was the comma list virtual
> table when a field containing comma-separated values might be used to
> output rows of values from this list. Other example - performing
> regexp-like query against a text and outputting the result columns. But
> this involves assuming that some columns of the virtual table are required
> "input" parameters while the other is "output".  For example, for comma
> list, the required parameter is the list, the output - extracted values.
> The consequence of this trick is that this virtual table is not fully
> functional one, so you can't query select * from it, it's useful only by
> providing where or "join .. on" clause containing the required "input"
> parameters.
>
> I usually encouraged sqlite to provide all required input parameters by
> reporting a "cheap" value of estimatedCost when I recognized my "input"
> parameters in xBestIndex call and providing an "expensive" values for any
> other cases. This usually worked for simple and complex cases until
> 3.8.0.0. It seems the version introduced Next-Generation Query Planner and
> I noticed that in some complex cases my cheap-expensive recommendations
> were ignored, so even when I noticed in the debugger that cheap value was
> provided for the right index and expensive for all other cases, the
> following xFilter provided not all values required probably deciding it
> knew better :). Before this I used cheap value 1 and expensive value 10000.
> Testing (3.17.0) revealed that in some cases multiplying expensive by x100
> helped, but other cases required increasing this value even more.
>
> So, what is the maximum reasonable value of estimatedCost that will not
> turn sqlite into possible overflow errors while telling at the same time
> that I consider some variant very, very expensive? Or maybe changing cheap
> from 1 to 0 will do the trick?
>

FWIW, I've often wondered about the cost estimates of real tables versus
virtual tables,
especially since many vtables implementations don't involve real IO but
pure in-memory
computations. There's very little advice or documentation on this important
subject, and
Max's email reveals that empirical testing leading to ad-hoc heuristics are
vulnerable to
breaking when SQLite itself evolves. More guidance and perhaps even some
"normative"
documentation is needed IMHO. Thanks, --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: All versions compatible "very big" estimatedCost (Virtual tables)

Hick Gunter
>FWIW, I've often wondered about the cost estimates of real tables versus virtual tables, especially since many vtables implementations don't involve real IO but pure in-memory computations. There's >very little advice or documentation on this important subject, and Max's email reveals that empirical testing leading to ad-hoc heuristics are vulnerable to breaking when SQLite itself evolves. More >guidance and perhaps even some "normative"
>documentation is needed IMHO. Thanks, --DD _______________________________________________

The "estimated cost" is described as "how many disk IO operations are expected". Version higher than 3.8.2 allow setting an "estimatedRows" (default: 25) and versions above 3.9.0 allow setting a flag that indicates that 0 or 1 rows will be returned (i.e. the constraints form a "unique index").


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: All versions compatible "very big" estimatedCost (Virtual tables)

Dominique Devienne
On Tue, Mar 28, 2017 at 11:00 AM, Hick Gunter <[hidden email]> wrote:

> >FWIW, I've often wondered about the cost estimates of real tables versus
> virtual tables, especially since many vtables implementations don't involve
> real IO but pure in-memory computations. There's >very little advice or
> documentation on this important subject, and Max's email reveals that
> empirical testing leading to ad-hoc heuristics are vulnerable to breaking
> when SQLite itself evolves. More >guidance and perhaps even some "normative"
> >documentation is needed IMHO. Thanks, --DD ______________________________
> _________________
>
> The "estimated cost" is described as "how many disk IO operations are
> expected". Version higher than 3.8.2 allow setting an "estimatedRows"
> (default: 25) and versions above 3.9.0 allow setting a flag that indicates
> that 0 or 1 rows will be returned (i.e. the constraints form a "unique
> index").
>

Thanks for the RTFM. That helps. Looks like I have about 3 years worth of
catch-up to do on sqlite3_index_info.

Sorry for hijacking Max's thread, which is why his costs seems to be
ignored. --DD

  /* Fields below are only available in SQLite 3.8.2 and later */
  sqlite3_int64 estimatedRows;    /* Estimated number of rows returned */
  /* Fields below are only available in SQLite 3.9.0 and later */
  int idxFlags;              /* Mask of SQLITE_INDEX_SCAN_* flags */
  /* Fields below are only available in SQLite 3.10.0 and later */
  sqlite3_uint64 colUsed;    /* Input: Mask of columns used by statement */
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: All versions compatible "very big" estimatedCost (Virtual tables)

Max Vlasov
On Tue, Mar 28, 2017 at 12:51 PM, Dominique Devienne <[hidden email]>
wrote:

> On Tue, Mar 28, 2017 at 11:00 AM, Hick Gunter <[hidden email]> wrote:
>
> >
> > The "estimated cost" is described as "how many disk IO operations are
> > expected". Version higher than 3.8.2 allow setting an "estimatedRows"
> > (default: 25) and versions above 3.9.0 allow setting a flag that
> indicates
> > that 0 or 1 rows will be returned (i.e. the constraints form a "unique
> > index").
> >
>
> Thanks for the RTFM. That helps. Looks like I have about 3 years worth of
> catch-up to do on sqlite3_index_info.
>
>

Thanks, Dominique, Hick

I looked at the additional fields of sqlite3_index_info and probably
they're of no use for me, at least they don't provide some enforcing of my
particular index. As for my problem queries, they usually involve "order
by" or "group by", that's when the planner (despite the obvious bias from
me about the particular index) decides to provide only one of my two
required fields.

Also interesting that actually what I called input/output approach, sqlite
implemented calling it table-valued functions (probably starting 3.9.0) (
https://sqlite.org/vtab.html#tabfunc2). To make my tables compatible with
this syntax, I just had to append HIDDEN to the "input" fileds of the table
definition. But seems like changing to this definition doesn't affect the
planner.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: All versions compatible "very big" estimatedCost (Virtual tables)

Hick Gunter
You can always use CROSS JOIN to force a specific join order as in:

SELECT ... FROM mytable m CROSS_JOIN split s ON (s.input = m.string_field) JOIN anothertable a ON (a.field = s.output) ...

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Max Vlasov
Gesendet: Dienstag, 28. März 2017 13:37
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables)

On Tue, Mar 28, 2017 at 12:51 PM, Dominique Devienne <[hidden email]>
wrote:

> On Tue, Mar 28, 2017 at 11:00 AM, Hick Gunter <[hidden email]> wrote:
>
> >
> > The "estimated cost" is described as "how many disk IO operations
> > are expected". Version higher than 3.8.2 allow setting an "estimatedRows"
> > (default: 25) and versions above 3.9.0 allow setting a flag that
> indicates
> > that 0 or 1 rows will be returned (i.e. the constraints form a
> > "unique index").
> >
>
> Thanks for the RTFM. That helps. Looks like I have about 3 years worth
> of catch-up to do on sqlite3_index_info.
>
>

Thanks, Dominique, Hick

I looked at the additional fields of sqlite3_index_info and probably they're of no use for me, at least they don't provide some enforcing of my particular index. As for my problem queries, they usually involve "order by" or "group by", that's when the planner (despite the obvious bias from me about the particular index) decides to provide only one of my two required fields.

Also interesting that actually what I called input/output approach, sqlite implemented calling it table-valued functions (probably starting 3.9.0) ( https://sqlite.org/vtab.html#tabfunc2). To make my tables compatible with this syntax, I just had to append HIDDEN to the "input" fileds of the table definition. But seems like changing to this definition doesn't affect the planner.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: [hidden email]

This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation.


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: All versions compatible "very big" estimatedCost (Virtual tables)

Max Vlasov
In reply to this post by Max Vlasov
On Tue, Mar 28, 2017 at 11:26 AM, Max Vlasov <[hidden email]> wrote:

>
>
> So, what is the maximum reasonable value of estimatedCost that will not
> turn sqlite into possible overflow errors while telling at the same time
> that I consider some variant very, very expensive? Or maybe changing cheap
> from 1 to 0 will do the trick?
>
>
Hi again, replying to myself since I  noticed a more straightforward case
explaining estimatedCost peculiarities

Sqlite 3.17.0

My comma-list virtual table implementation (vtcommalist) reported the table
structure as
  CREATE TABLE [xxx] ([CommaList] TEXT HIDDEN, [Value] TEXT)
Basically it is only functional when CommaList is provided so it can
produce Value in this case based on coming CommaList. Hidden column here to
allow table-valued functions syntax.

Database:
The virtual table
  create virtual table [cmlist] Using VtCommaList
a simple test table with lists data
  create table  [lists] ([list] TEXT)
populated with 10000 rows
  insert into lists (list) values ('1, 2, 3, 4')

The query in question:
  Select distinct trim(value) From lists, cmlist(list)

Sqlite asks for index evaluation twice:
1. Suggesting constraint for CommaList field only, my code reported
"cheap"  (value 1)
2. Suggesting no constraint at all, my code reported "expensive" (value
1000000000)

But Sqlite still went for full-scan in this case (second choice) so I had
to report error leading to "SQL Logic error"
Only when I increased "expensive" to 1000000000*10 = 10000000000, Sqlite
accepted my cheap/expensive ratio as worth considering with following
filter with commalist field constraint.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Loading...