eponymous vtables, xBestIndex, and required parameters...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

eponymous vtables, xBestIndex, and required parameters...

dave
Folks;
 
I am building a system which uses the virtual tables facilityto join some
non-relational data (coming from system APIs) to relational data.  I am
using the virtual table mechanism in sqlite, and in particular am creating
'eponymous' virtual tables to create 'table valued functions'.
 
I have a problem that puzzles me regarding the appropriate implementation of
the xBestIndex method, in particular in dealing with constraints that are
required for the table.  In my case, it is required that there be a "where
PARAMCOL = 'xxx'" condition somewhere in the query, or the table valued
function makes no sense.  The problem arises where the sqlite engine asks
for advice during the query planning phase, and sometimes sqlite chooses to
take a plan where NO constraints are applied.  A concrete example may help
clarify:
 
Given:
  1)  vtable 'fsdirinfo' projecting columns PATH, FNAME
      'PATH' is a directory, and is required, because this virtual table
does a directory listing of files in a particular path, and you can't get a
directory listing without supplying the path.
  2)  vtable 'verinfo' projecting columns FQPATH, FILEVER, COMPANYNAME,
PRODUCT, COMMENTS
      'FQPATH' is required, because you can't open a file (to read it's
version information) without having a [fully qualified] path to that file.
 
Many times, for simple queries like "select * from fsdirinfo where path =
'c:\windows\system32';"
This works as expected:  xBestIndex is called, it is noticed that there is
an equality constraint on path (as required), and that column is marked
thusly:
 
  pIdxInfo->aConstraintUsage[nIdx].argvIndex = nArgvIndex;
  pIdxInfo->aConstraintUsage[nIdx].omit = 1; //meaning 'sqlite does not need
to help filter more'

(please verify my interpretation of these fields).
 
I also went as far as adding some validation logic where if the required
constraints are not met, I emit an error:
 
  //see if we have met our requirements; it is required that we have an
equality
  //operation on column 1 (PATH), otherwise we can cope
  if ( SQLITE_INDEX_CONSTRAINT_EQ != afsde[1]._nOp )
  {
   /*XXX apparently, this can happen regardless of user SQL
   //must have required columns in constraint list. must must must.
   sqlite3_free( pThis->zErrMsg );
   pThis->zErrMsg = sqlite3_mprintf( VTBLNAME": There must be a constraint
on PATH" );
   return SQLITE_CONSTRAINT; //XXX something better?
  */
  }

but as you can see I commented that out, because sometimes sqlite asks 'what
if I just gave you no constraints, what would you think about that', and
then the error I emit causes the whole query to fail, even though I have
already indicated a query plan that would work.  (I.e., sqlite asks once
"how about constraining on 'path'" to which I reply "yes please", and then
it asked "well, what about constraining on nothing", to which I reply
"error".  Sqlite does not back off and use the other plan that was
'approved'.)
 
As a work-around I masked the invalid unconstrained condition in my xFilter
by simply returning no rows.  However, that approach has problems, too.
Consider this query, which consists of a join between the two vtables I
mentioned:
 
  select * from fsdirinfo fs, verinfo ( fs.PATH || '\' || fs.FNAME ) vi
  where
   fs.path = 'C:\Windows\System32'
   and ( fs.fname glob '*.exe' or fs.fname glob '*.dll' );
 
As it is presently, this returns.... No rows!  Huh!  Well, I stepped through
the code of course before writing in this question, and here's what happens:
1)  fsdirinfo is requested to query on path via equality and fname via glob,
and I can use both in this case
2)  verinfo is requested to query on fqpath via equality, and I can take
that
3)  verinfo gets requested a second time to query on nothing at all.  I have
no choice to say 'uhm, OK' and setup to return no rows, since my
communicating that is a error will abort the query
 
And sqlite then selects the item 3 approach.  Upon sqlite3_step(), fsdirinfo
is run through all the rows as expected, and verinfo is run through, but of
course gets no FQPATH value, and therefor cannot return any rows.  Net
result:  the join results in no rows.  (Incidentally, I have tried this
query functionally in three ways:  1) hidden columns for params (depicted
above), 2) constraints on projected columns (i.e. ye olde
cartesian-product-with-where-clause), 3) join syntax.  All three conventions
work fine, and in the same way.  Alas, that way yields no results, haha.)
 
OK, so with some context, now I can ask some questions (sorry for so much
exposition above).  Questions:
1)  is there an orthodox method of indicating that a query plan request from
xBestIndex is a no-go, for wahtever reason (in my case there must be
constraints on some columns), but does not abort the entire query, and
simply causes one of the other accepted plans to be used.  (I think this is
the single best option if it does exist).
2)  am I using the 'pIdxInfo->aConstraintUsage[nIdx].omit' variable
appropriately?  My interpretation is that means 'the vtable can fully handle
the constraint, sqlite does not need to do a double-check on it afterwards'.
I.e., in the example above, the vtable can totally handle it, but if there
was something like a regex on fname, the vtable can help narrow the results
down, but sqlite needs to do a final fer-real regex test to reject some that
select through anyway.  In that case omit would be 0.
 
Lastly, the third question regards my work-around.  I feel dirty doing this,
but if the query plan fails requirements, then instead of communicating an
error value, what I'm doing is setting the cost to infinity, e.g.
 
3)  is something like this the only hope:
 
 if ( 0 == pIdxInfo->idxNum )
 {
    pIdxInfo->estimatedCost = DBL_MAX;
 }
 else
 {
    pIdxInfo->estimatedCost = 10;
 }
 
Doing this did keep sqlite picking the 'approved' plan, but it just doesn't
feel like a deterministic solution to fiddle with query costs to avoid
catastrophe.
 
And I guess as a bonus 4th question:  What is the established orthodoxy in
picking estimatedCost anyway?  It seems from the source comments that it is
intended to mean 'approximate number of disk accesses', which I understand
qualitatively, but what is a 'disk access' quantitatively.  And how would I
compare that to, say, an API call that I am using as source data for my
vtable, which is purely in-memory, but could conceivably be quite expensive
(e.g. network stuff), so how would I tweak that?
 
OK!  Thanks so much if you read this far!  And thanks even more if anyone
can advise on how to deal with xBestIndex and required constraints on an
eponymous vtable acting as a table-valued function!
 
Cheers;
-dave


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: eponymous vtables, xBestIndex, and required parameters...

Richard Hipp-3
On 10/4/17, dave <[hidden email]> wrote:
> 1)  is there an orthodox method of indicating that a query plan request from
> xBestIndex is a no-go,

Give that plan a huge estimatedCost.

As a backup, in the exceedingly unlikely event that SQLite chooses
your no-go plan in spite of the huge estimatedCost, also provide a
unique idxNum and if xFilter sees that idxNum, have xFilter throw an
error with error message text that is something like "query planner
could not find an acceptable solution".


> 2)  am I using the 'pIdxInfo->aConstraintUsage[nIdx].omit' variable
> appropriately?  My interpretation is that means 'the vtable can fully handle
> the constraint, sqlite does not need to do a double-check on it afterwards'.

Yes.  Correct.


> And I guess as a bonus 4th question:  What is the established orthodoxy in
> picking estimatedCost anyway?

It is not overly sensitive to the scale of your cost estimates.   For
best results, let 1.0 be the same time as required to do a single
b-tree lookup on a one-page b-tree.  In other words, the cost of
reading a single page of the database file from disk into cache and
then doing a little decoding work to perform a binary search among the
approximately 200 keys on that page.

You don't know how to estimate that?  Then guess.  As long as the
relative costs for other invocations of xBestIndex on the same virtual
table are in reasonable proportion, everything should work fine.

In the source tree, go to the ext/misc folder and grep for
"estimatedCost" to see how some of the extension virtual tables do it.

--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: eponymous vtables, xBestIndex, and required parameters...

dave
> ...
> > 1)  is there an orthodox method of indicating that a query
> plan request from
> > xBestIndex is a no-go,
>
> Give that plan a huge estimatedCost.
>
> As a backup, in the exceedingly unlikely event that SQLite chooses
> your no-go plan in spite of the huge estimatedCost, also provide a
> unique idxNum and if xFilter sees that idxNum, have xFilter throw an
> error with error message text that is something like "query planner
> could not find an acceptable solution".
> ...
> And I guess as a bonus 4th question:  What is the established orthodoxy in
> picking estimatedCost anyway?
> ...
> It is not overly sensitive to the scale of your cost estimates.   For
> ...
> You don't know how to estimate that?  Then guess.  As long as the
> relative costs for other invocations of xBestIndex on the same virtual
> table are in reasonable proportion, everything should work fine.

Thanks!  I like the idxNum tweak for the error message; I'll add that stuff
in.

And the info about relative costs _on_the_same_virtual_table_ is very
enlightening because I suppose the converse is true, that the extimated cost
relative to OTHER virtual/physical tables does NOT matter.

-dave


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: eponymous vtables, xBestIndex, and required parameters...

Richard Hipp-3
On 10/5/17, dave <[hidden email]> wrote:
>
> I suppose the converse is true, that the extimated cost
> relative to OTHER virtual/physical tables does NOT matter.
>

It matters some, but it is of lesser importance.
--
D. Richard Hipp
[hidden email]
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users