Estimated Costs and Memory DBs

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Estimated Costs and Memory DBs

Justin Olbrantz
While looking through the SQLite3 source trying to find answers to some
questions I had about virtual tables, I noticed that the memory DB is
implemented as a VFS rather than a database. Is my understanding correct
that this means that the estimated cost the query planner uses for memory
tables will be equal to that of the same database on disk? Shouldn't memory
DBs always have a much lower cost to cause the query planner to prefer
intensive operations on memory DBs rather than disk DBs? I know there's a
field in the table definition for cost multiplier that could perhaps be
used for this purpose, but as far as I can tell this is only ever used by
ANALYZE and it's theoretically impossible that memory DBs could even use it.

As for the question I was originally looking for an answer to, I am writing
a virtual table for a different file format, and it is expected that my
virtual table will be held completely in memory. What should I do with the
estimatedCost value from xBestIndex? According to the documentation this
should be an approximation of the number of disk accesses for the query,
which would be 0 in this case. But it's clearly vastly faster to do a query
on an indexed column, meaning the cost for an indexed column should be much
lower than the cost for an unindexed column. How should I be doing this?

--
Justin Olbrantz (Quantam)
"Ardente veritate
Urite mala mundi
Ardente veritate
Incendite tenebras mundi"
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Estimated Costs and Memory DBs

Dominique Devienne
On Wed, Jul 24, 2019 at 2:55 AM Justin Olbrantz <[hidden email]>
wrote:

> [...] my virtual table will be held completely in memory. What should I do
> with the
> estimatedCost value from xBestIndex? According to the documentation this
> should be an approximation of the number of disk accesses for the query,
> which would be 0 in this case. But it's clearly vastly faster to do a query
> on an indexed column, meaning the cost for an indexed column should be much
> lower than the cost for an unindexed column. How should I be doing this?
>

This is unfortunately similar to questions I asked on this ML in the past,
with no good answers as far as I remember...

For queries that mix "normal" disk-tables, in-memory-tables (i.e. the
equivalent
of a disk-table but with the DB file on a RAM disk, entirely in memory), and
virtual-tables (e.g. accessing a C++ in-memory container of structs), the
cost
structure of these 3 are quite different. The last 2 are entirely in-memory
both,
yet the last case is quite a bit faster still (no decoding of
pages/rows/varints needed).

I confess to not studying the code to try to answer that myself though...
--DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Estimated Costs and Memory DBs

Hick Gunter
The speed of a virtual table depends on the backing store and software used to implement it.

We have virtual tables that reference CTree files as well as virtual tables that reference memory sections here. The advantage is that the VT implementation can adjust it's answers in the xBestIndex function.

-----Urspr√ľngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Dominique Devienne
Gesendet: Mittwoch, 24. Juli 2019 10:02
An: SQLite mailing list <[hidden email]>
Betreff: [EXTERNAL] Re: [sqlite] Estimated Costs and Memory DBs

On Wed, Jul 24, 2019 at 2:55 AM Justin Olbrantz <[hidden email]>
wrote:

> [...] my virtual table will be held completely in memory. What should
> I do with the estimatedCost value from xBestIndex? According to the
> documentation this should be an approximation of the number of disk
> accesses for the query, which would be 0 in this case. But it's
> clearly vastly faster to do a query on an indexed column, meaning the
> cost for an indexed column should be much lower than the cost for an
> unindexed column. How should I be doing this?
>

This is unfortunately similar to questions I asked on this ML in the past, with no good answers as far as I remember...

For queries that mix "normal" disk-tables, in-memory-tables (i.e. the equivalent of a disk-table but with the DB file on a RAM disk, entirely in memory), and virtual-tables (e.g. accessing a C++ in-memory container of structs), the cost structure of these 3 are quite different. The last 2 are entirely in-memory both, yet the last case is quite a bit faster still (no decoding of pages/rows/varints needed).

I confess to not studying the code to try to answer that myself though...
--DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Estimated Costs and Memory DBs

Dominique Devienne
On Wed, Jul 24, 2019 at 10:45 AM Hick Gunter <[hidden email]> wrote:

> The speed of a virtual table depends on the backing store and software
> used to implement it.
>

[DD] Sure. virtual-tables can also access the disk and do expensive things.
[DD] I did say "example given" for my fast-pure-memory-no-decoding case.


> We have virtual tables that reference CTree files as well as virtual
> tables that reference memory sections here.

The advantage is that the VT implementation can adjust it's answers in the
> xBestIndex function.


[DD] I'm not sure I see your point. My point (and Justin's if I understand
him right), is that the relative
[DD] costs from tables vs virtual-tables is hard to figure out, which could
skew results of the planner
[DD] toward sub-optimal plans.

[DD] Most of my queries involve only my own virtual tables, so I use
arbitrary relative costs, like
[DD] 1 if returning a single row via a (virtual) unique index or PK, 2 if
returning a range of rows, and 4 for a full table scan.
[DD] But these "relative for my vtable costs" are probably completely wrong
when mixed with "real" tables,
[DD] disk-based or in-memory. There must be some meaningful correlations
between all costs for an optimal plan.
[DD] Or am I missing something? --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Estimated Costs and Memory DBs

Hick Gunter
With the current interface, the xBestIndex function has the possibility of returning "effort" and "result set size" separately, instead of just an aggregate "effort" (which was at the time documented to assume "result set size").

Since we have nearly no native tables here, the question of "relative to native tables" does not pop up.

For disk-based native tables and CTree based VT I would tend to assume about equal effort-to-result-set ratios, whereas memory section VTs are probably significantly faster.

Since the QP operates with logarithmic cost estimates, I return 1 for a single row unique key retreival and the number of records for a full table scan. Partial key retrievals score a best estimate based on the assumption of equal selectivity per key field (e.g. for a 1000 row table and a three field key, it would be 1 for full key, 10 for the leading 2 key fields, 100 for a single leading key field and 1000 for no key fields = full table scan).

This is also factored in when choosing between several indexes that all match the constraints to some degree (2 leading fields form a 3 field key is preferred over 3 leading fields from a 7 field key), and also considering a requested ORDER BY clause.


-----Urspr√ľngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von Dominique Devienne
Gesendet: Mittwoch, 24. Juli 2019 11:41
An: SQLite mailing list <[hidden email]>
Betreff: Re: [sqlite] [EXTERNAL] Re: Estimated Costs and Memory DBs

On Wed, Jul 24, 2019 at 10:45 AM Hick Gunter <[hidden email]> wrote:

> The speed of a virtual table depends on the backing store and software
> used to implement it.
>

[DD] Sure. virtual-tables can also access the disk and do expensive things.
[DD] I did say "example given" for my fast-pure-memory-no-decoding case.


> We have virtual tables that reference CTree files as well as virtual
> tables that reference memory sections here.

The advantage is that the VT implementation can adjust it's answers in the
> xBestIndex function.


[DD] I'm not sure I see your point. My point (and Justin's if I understand him right), is that the relative [DD] costs from tables vs virtual-tables is hard to figure out, which could skew results of the planner [DD] toward sub-optimal plans.

[DD] Most of my queries involve only my own virtual tables, so I use arbitrary relative costs, like [DD] 1 if returning a single row via a (virtual) unique index or PK, 2 if returning a range of rows, and 4 for a full table scan.
[DD] But these "relative for my vtable costs" are probably completely wrong when mixed with "real" tables, [DD] disk-based or in-memory. There must be some meaningful correlations between all costs for an optimal plan.
[DD] Or am I missing something? --DD
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Estimated Costs and Memory DBs

Dominique Devienne
On Wed, Jul 24, 2019 at 3:09 PM Hick Gunter <[hidden email]> wrote:

> With the current interface, the xBestIndex function has the possibility of
> returning "effort" and "result set size" separately, instead of just an
> aggregate "effort" (which was at the time documented to assume "result set
> size").
>

Thanks Gunter for the reminder. Indeed my vtables have been implemented a
long time ago, and only use estimatedCost.
I need to investigate estimatedRows and the other newer xBestIndex
features, include (re)reading several times the doc. --DD

From https://www.sqlite.org/c3ref/index_info.html

  double estimatedCost;           /* Estimated cost of using this index */
  /* Fields below are only available in SQLite 3.8.2 and later */
  sqlite3_int64 estimatedRows;    /* Estimated number of rows returned */
  /* Fields below are only available in SQLite 3.9.0 and later */
  int idxFlags;              /* Mask of SQLITE_INDEX_SCAN_* flags */
  /* Fields below are only available in SQLite 3.10.0 and later */
  sqlite3_uint64 colUsed;    /* Input: Mask of columns used by statement */
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users