virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

dave
Hi folks,
 
I am trying to fully understand the impact and correct use of a few subtle
features related to virtual tables' the xBestIndex mechanism, and their
correct use.  Here are my current beliefs:
 
*  pIdxInfo->estimatedCost
  obviously the cost of the proposed plan; a metric of the 'viscosity' of
the table when traversing through xNext relative to other tables and
especially to filesystem access
*  pIdxInfo->estimatedRows
  obviously the approximate number of rows that a proposed plan will return.
But less obvious to me is how this materially affects the query plan,
especially relative to pIdxInfo->estimatedCost
  and a little bit with respect to:
* pIdxInfo->idxFlags
  when the SQLITE_INDEX_SCAN_UNIQUE is set.  Isn't setting
pIdxInfo->estimatedRows to 0 or 1 enough to communicate this same
information?
 
Anyway, I am dutifully setting both estimatedRows and idxFlags in cases
where I have a 0-or-1-result table (I have several of these), and I am also
estimatedRows to LLONG_MAX along with estimatedCost to DBL_MAX in cases
where a plan can never be executed (btw I would respectfully suggest perhaps
using a bit in idxFlags to communicate 'never use this plan, it will never
work').
 
I haven't had any ill effects doing the above, but wonder if that is
'correct'.  Also, it would be interesting just to know what the material
effect of estimatedRows and idxFlags is, so that I can maybe use them more
effectively.
 
Any thoughts or corrections to my thinking?  Thanks in advance; cheers!
-dave
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

Hick Gunter
I can provide some info coming from our experience with SQLite 3.7.14:

Since most SQl processing is IO bound, the "estimated cost" should be the number of disk IO operations required to retrieve the rows. The later addition of "estimated rows" reflects to the fact that some virtual table implementations might use non-disk storage (e.g. process or shared memory), where the number of IO operations is determined by the resident set and the cost of paging/swapping.

Lets say you have 10000 records of 200 bytes with 50 bytes of key overhead stored in some kind of ISAM file, and a page size of 4k.

Performing a full table scan will take an estimated 10000 * 200 / 4096 ~= 489 disk accesses, whereas looking up a single record will take about 3 (50 bytes per key in a 4096 byte page gives an estimated fan out of over 100, resulting in 2 pages to read from the index and 1 for the record itself). Performing a partial index scan that returns 100 records will take 2 acesses to locate the first record, 1 more if a second index page is required and anywhere between 5 (if the records are contiguous) and 100 (if each is from a separate page) accesses to retrieve the records themselves.

Regarding the UNIQUE flag, this is quite different from the number of estimated rows, which may be 0 or 1 due to rounding errors on a non-unique index (e.g. the initials of a set of 100 people has a cardinality of 26*26=676, giving an average number of 0,1479 records per index entry, but there may still be duplicates).

-----Urspr√ľngliche Nachricht-----
Von: sqlite-users [mailto:[hidden email]] Im Auftrag von dave
Gesendet: Donnerstag, 19. Oktober 2017 18:43
An: 'SQLite mailing list' <[hidden email]>
Betreff: [EXTERNAL] [sqlite] virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

Hi folks,

I am trying to fully understand the impact and correct use of a few subtle features related to virtual tables' the xBestIndex mechanism, and their correct use.  Here are my current beliefs:

*  pIdxInfo->estimatedCost
  obviously the cost of the proposed plan; a metric of the 'viscosity' of the table when traversing through xNext relative to other tables and especially to filesystem access
*  pIdxInfo->estimatedRows
  obviously the approximate number of rows that a proposed plan will return.
But less obvious to me is how this materially affects the query plan, especially relative to pIdxInfo->estimatedCost
  and a little bit with respect to:
* pIdxInfo->idxFlags
  when the SQLITE_INDEX_SCAN_UNIQUE is set.  Isn't setting
pIdxInfo->estimatedRows to 0 or 1 enough to communicate this same
information?

Anyway, I am dutifully setting both estimatedRows and idxFlags in cases where I have a 0-or-1-result table (I have several of these), and I am also estimatedRows to LLONG_MAX along with estimatedCost to DBL_MAX in cases where a plan can never be executed (btw I would respectfully suggest perhaps using a bit in idxFlags to communicate 'never use this plan, it will never work').

I haven't had any ill effects doing the above, but wonder if that is 'correct'.  Also, it would be interesting just to know what the material effect of estimatedRows and idxFlags is, so that I can maybe use them more effectively.

Any thoughts or corrections to my thinking?  Thanks in advance; cheers!
-dave
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

dave
> Behalf Of Hick Gunter
> Sent: Friday, October 20, 2017 1:55 AM
>
> I can provide some info coming from our experience with SQLite 3.7.14:
>
> Since most SQl processing is IO bound, the "estimated cost"
> should be the number of disk IO operations required to
> retrieve the rows. The later addition of "estimated rows"
> reflects to the fact that some virtual table implementations
> might use non-disk storage (e.g. process or shared memory),
> where the number of IO operations is determined by the
> resident set and the cost of paging/swapping.
>
> Lets say you have 10000 records of 200 bytes with 50 bytes of
> key overhead stored in some kind of ISAM file, and a page size of 4k.
>
> Performing a full table scan will take an estimated 10000 *
> 200 / 4096 ~= 489 disk accesses, whereas looking up a single
> record will take about 3 (50 bytes per key in a 4096 byte
> page gives an estimated fan out of over 100, resulting in 2
> pages to read from the index and 1 for the record itself).
> Performing a partial index scan that returns 100 records will
> take 2 acesses to locate the first record, 1 more if a second
> index page is required and anywhere between 5 (if the records
> are contiguous) and 100 (if each is from a separate page)
> accesses to retrieve the records themselves.
>
> Regarding the UNIQUE flag, this is quite different from the
> number of estimated rows, which may be 0 or 1 due to rounding
> errors on a non-unique index (e.g. the initials of a set of
> 100 people has a cardinality of 26*26=676, giving an average
> number of 0,1479 records per index entry, but there may still
> be duplicates).
 
Thanks so much, Hick, for the detailed info.

I guess I am still a little unclear about the importance of
SQLITE_INDEX_SCAN_UNIQUE, but I am interpreting your statement to mean
something like 'it is a more assertive statement about the number of rows
returned than the row estimate of an equal value', and that somehow guides
the query planner more strongly in some direction.  It also sounds like what
I was doing (described in my first message, here elided), was fine.

Thanks, and cheers!
-dave


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users