Getting an advance list of RowIDs for a query result set

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting an advance list of RowIDs for a query result set

curmudgeon
For a complex query you can often get a list of the base table RowIDs very quickly with a simple query and then use an array of those values (along with the carray virtual table) to retrieve sections of data from the complex query almost instantly. I've been doing this for a while but would love to find a way of doing it automatically. The problem is as follows

Suppose I have the following query

SELECT ColA, ColB, ColC, ....
FROM BaseTbl
jointype_1 Tbl1 on comparison_1
jointype_2 Tbl2 on comparison_2
.
.
jointype_n Tbln on comparison_n
WHERE ....
ORDER BY ....

and I want a list of the BaseTbl's rowid's for all records returned by the above query in
the order defined by ORDER BY. I also want it in the fastest possible way. In other words I
want

SELECT BaseTbl.RowID
FROM BaseTbl
jointype_1 Tbl1 on comparison_1
jointype_2 Tbl2 on comparison_2
.
.
jointype_n Tbln on comparison_n
WHERE ....
ORDER BY ....

with all redundant tables removed.

The question is, how to determine what tables are redundant?

Obviously any table involved in the WHERE or ORDER BY has to be retained aa does any table
acting as an intermediate to join those tables back to the BaseTbl.

Of the other tables I'm thinking I can discard table_i provided it's joined to the BaseTbl by a table_i unique index and comparison_i covers all columns of that index. It's fairly obvious if jointype_i == left join but I'm thinking it works also for inner join. Can anyone give me a counter example?

Anyone think of other ways?

I'm also wondering if the result produced by 'explain query plan' could simplify the coding although I know use of that isn't recommended as the structure of it is not guaranteed.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
A trivial example of what I'm trying to do. Given

select * from AwfyBigTbl where ACol=?;

I'd run the query

select RowID from AwfyBigTbl where ACol=?;

step through the records and store the values in a std::vector<int64_t>
called ID.
I could then retrieve n records starting at TopRecNo (0 based) with the
query

select * from (select Value from carray(ID+?1, ?2, 'int64'))
inner join AwfyBigTbl on AwfyBigTbl.RowID = _Value;

by setting ?1 = TopRecNo and ?2 = n.

I realise I could get the same result with

select * from AwfyBigTbl where ACol=? limit n offset TopRecNo;

but the first way will be faster and, as the record count - ID.size() - is
known , it would allow a
record counter to be shown, vertical scrollbar to be sized, last known
record to be relocated etc.

Anyway, what I'm trying to do is find a way of doing the same thing for more
complicated
queries. Given a select where a 'base table' is attached to lookup tables
how can I determine which of the lookup tables can be removed from the table
such that

select BaseTbl.RowID from ... where ... order by ...

will find the set of records that represents the original query.

Hope that's clearer.





--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
In reply to this post by curmudgeon
Sorry, in last post

select * from (select Value from carray(ID+?1, ?2, 'int64'))
inner join AwfyBigTbl on AwfyBigTbl.RowID = _Value;

by setting ?1 = TopRecNo and ?2 = n.

should read

select * from (select Value from carray(*?1*, ?2, 'int64'))
inner join AwfyBigTbl on AwfyBigTbl.RowID = _Value;

by setting ?1 = *ID + TopRecNo* and ?2 = n.




--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

Simon Slavin-3
In reply to this post by curmudgeon


On 25 Nov 2017, at 1:15pm, curmudgeon <[hidden email]> wrote:

> Given a select where a 'base table' is attached to lookup tables
> how can I determine which of the lookup tables can be removed from the table
> such that
>
> select BaseTbl.RowID from ... where ... order by ...
>
> will find the set of records that represents the original query.

That optimization could be done at the level of the SQL engine.  You wouldn’t want to do it inside your own code since that would make your code extremely complicated.  So just execute the query without trying to optimize it and see what happens.  Does it run fast enough for your users ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
Simon, I’ve no users. I’ve been teaching myself c++ (and SQLite) during a lengthy illness so this is just experimentation for me.



This subject touches on a previous question of mine you were involved in regarding redundant tables. The following may jog your memory



create table TblA(A integer primary key, B int, C int);

create table TblB(B integer primary key, BX int);

create table TblC(C integer primary key, CX int);



explain query plan select A from TblA left join TblB using (B) left join TblC using (C) where BX=?;

0|0|0|SCAN TABLE TblA

0|1|1|SEARCH TABLE TblB USING INTEGER PRIMARY KEY (rowid=?)



explain query plan select A from TblA left join TblB using (B) left join TblC using (C) where CX=?;

0|0|0|SCAN TABLE TblA

0|1|1|SEARCH TABLE TblB USING INTEGER PRIMARY KEY (rowid=?)

0|2|2|SEARCH TABLE TblC USING INTEGER PRIMARY KEY (rowid=?)



In the first explain SQLite drops the trailing redundant table but in the second explain it doesn’t drop the middle redundant table.



As TblB is included in the second query it must surely run slower than if it were omitted.



I’m not complaining about the SQLite optimiser failing to spot the redundancy as it’s got to deal with a variety of queries far removed from my narrow experiment. Checking for such redundancies would likely slow down prepares and, when it comes down to it, anyone including TblB in the second query is only getting what they asked for.



I do think though that it’s possible to write code to remove these redundancies so as to get the vector of RowIDs as fast as possible. So far I’ve been splitting SQL into ‘RowSQL’ (returns RowIDs involved in correct order) and ‘ColSQL’ (returns columns requested in original SQL for the requested range as shown in my second post) but I’ve only been doing it visually via knowledge of the tables. What I’m trying to do is write a function to automatically ‘split’ the sql into RowSQL and ColSQL. I’ll make another post later showing where I’m at with that.

________________________________
From: sqlite-users <[hidden email]> on behalf of Simon Slavin <[hidden email]>
Sent: Saturday, November 25, 2017 1:26:00 PM
To: SQLite mailing list
Subject: Re: [sqlite] Getting an advance list of RowIDs for a query result set



On 25 Nov 2017, at 1:15pm, curmudgeon <[hidden email]> wrote:

> Given a select where a 'base table' is attached to lookup tables
> how can I determine which of the lookup tables can be removed from the table
> such that
>
> select BaseTbl.RowID from ... where ... order by ...
>
> will find the set of records that represents the original query.

That optimization could be done at the level of the SQL engine.  You wouldn’t want to do it inside your own code since that would make your code extremely complicated.  So just execute the query without trying to optimize it and see what happens.  Does it run fast enough for your users ?

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

Keith Medcalf
In reply to this post by curmudgeon

This is fixed in the current head of trunk.  Although the implementation may change, it will appear in the next release.

https://www.sqlite.org/src/timeline?n=50

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of x
>Sent: Saturday, 25 November, 2017 10:08
>To: SQLite mailing list
>Subject: Re: [sqlite] Getting an advance list of RowIDs for a query
>result set
>
>Simon, I’ve no users. I’ve been teaching myself c++ (and SQLite)
>during a lengthy illness so this is just experimentation for me.
>
>
>
>This subject touches on a previous question of mine you were involved
>in regarding redundant tables. The following may jog your memory
>
>
>
>create table TblA(A integer primary key, B int, C int);
>
>create table TblB(B integer primary key, BX int);
>
>create table TblC(C integer primary key, CX int);
>
>
>
>explain query plan select A from TblA left join TblB using (B) left
>join TblC using (C) where BX=?;
>
>0|0|0|SCAN TABLE TblA
>
>0|1|1|SEARCH TABLE TblB USING INTEGER PRIMARY KEY (rowid=?)
>
>
>
>explain query plan select A from TblA left join TblB using (B) left
>join TblC using (C) where CX=?;
>
>0|0|0|SCAN TABLE TblA
>
>0|1|1|SEARCH TABLE TblB USING INTEGER PRIMARY KEY (rowid=?)
>
>0|2|2|SEARCH TABLE TblC USING INTEGER PRIMARY KEY (rowid=?)
>
>
>
>In the first explain SQLite drops the trailing redundant table but in
>the second explain it doesn’t drop the middle redundant table.
>
>
>
>As TblB is included in the second query it must surely run slower
>than if it were omitted.
>
>
>
>I’m not complaining about the SQLite optimiser failing to spot the
>redundancy as it’s got to deal with a variety of queries far removed
>from my narrow experiment. Checking for such redundancies would
>likely slow down prepares and, when it comes down to it, anyone
>including TblB in the second query is only getting what they asked
>for.
>
>
>
>I do think though that it’s possible to write code to remove these
>redundancies so as to get the vector of RowIDs as fast as possible.
>So far I’ve been splitting SQL into ‘RowSQL’ (returns RowIDs involved
>in correct order) and ‘ColSQL’ (returns columns requested in original
>SQL for the requested range as shown in my second post) but I’ve only
>been doing it visually via knowledge of the tables. What I’m trying
>to do is write a function to automatically ‘split’ the sql into
>RowSQL and ColSQL. I’ll make another post later showing where I’m at
>with that.
>
>________________________________
>From: sqlite-users <[hidden email]> on
>behalf of Simon Slavin <[hidden email]>
>Sent: Saturday, November 25, 2017 1:26:00 PM
>To: SQLite mailing list
>Subject: Re: [sqlite] Getting an advance list of RowIDs for a query
>result set
>
>
>
>On 25 Nov 2017, at 1:15pm, curmudgeon <[hidden email]> wrote:
>
>> Given a select where a 'base table' is attached to lookup tables
>> how can I determine which of the lookup tables can be removed from
>the table
>> such that
>>
>> select BaseTbl.RowID from ... where ... order by ...
>>
>> will find the set of records that represents the original query.
>
>That optimization could be done at the level of the SQL engine.  You
>wouldn’t want to do it inside your own code since that would make
>your code extremely complicated.  So just execute the query without
>trying to optimize it and see what happens.  Does it run fast enough
>for your users ?
>
>Simon.
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
Thanks Keith.

This one

Update the omit-table-from-left-join optimization so that it can omit tables from the middle of the join as well as the end.

Deals with the case I mentioned in my last post but this one is also interesting

Fix a problem preventing the planner from identifying scans that visit at most one row in cases where that property is guaranteed by a unique, not-null, non-IPK column that is the leftmost in its table

The way I was going to tackle the problem of redundant tables was as follows.

Given

SELECT ColA, ColB, ColC, ....
FROM BaseTbl
jointype_1 Tbl1 on comparison_1
jointype_2 Tbl2 on comparison_2
.
.
jointype_n Tbln on comparison_n
WHERE ....
ORDER BY ....

I want to find the redundant tables in the following query

SELECT BaseTbl.RowID
FROM BaseTbl
jointype_1 Tbl1 on comparison_1
jointype_2 Tbl2 on comparison_2
.
.
jointype_n Tbln on comparison_n
WHERE ....
ORDER BY ....

I proceed as follows


  1.  Omit a table join from the SQL and try preparing it.
  2.  If it prepares OK then the table isn’t involved in the WHERE or ORDER BY.
  3.  If it’s joined to the BaseTbl by an integer primary key or FULLY joined by a unique index then the table is redundant. By FULLY I mean ALL fields of the index are included in the join as otherwise the BaseTbl’s relationship with it is of a ONE TO MANY nature which means the BaseTbl RowID won’t uniquely identify a row of the original query. [Two things I’m unsure about are a) how nulls affect unique index  joins and b) how to deal with tables that aren’t directly linked to the BaseTbl (i.e. they’re linked via an intermediate table)].
  4.  If the table is needed reintroduce it into the SQL.

Do this in turn for each of the joins.

I’m wondering if the second of those trunk changes is in any way related to what I’m trying to do. The above is a bit long winded and not easy to code so it would be great if the SQLite query optimizer did it all for me.

To me the time taken to grab a grid page of data is negligible if you know where to look for it on disc. I tend therefore to time queries by how fast I can get all the BaseTbl RowIDs into a vector. The biggest table in my database has 2.4 million rows and yet wait cursors are a very rare sight.

Tom


From: Keith Medcalf<mailto:[hidden email]>
Sent: 25 November 2017 18:15
To: SQLite mailing list<mailto:[hidden email]>
Subject: Re: [sqlite] Getting an advance list of RowIDs for a query result set


>This is fixed in the current head of trunk.  Although the implementation may change, it will appear in the next release.

https://www.sqlite.org/src/timeline?n=50



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
In reply to this post by Keith Medcalf
>  If it’s joined to the BaseTbl by an integer primary key or FULLY joined by a unique index then the table is redundant.

I’m talking **** there. If it’s an inner join SQLite needs to check the record exists in the joined table.

Sorry about that, back to left joins.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

E.Pasma
In reply to this post by curmudgeon
x wrote:

>
> I proceed as follows
>
>
>  1.  Omit a table join from the SQL and try preparing it.
>  2.  If it prepares OK then the table isn’t involved in the WHERE or  
> ORDER BY.
>  3.  If it’s joined to the BaseTbl by an integer primary key or  
> FULLY joined by a unique index then the table is redundant. By FULLY  
> I mean ALL fields of the index are included in the join as otherwise  
> the BaseTbl’s relationship with it is of a ONE TO MANY nature which  
> means the BaseTbl RowID won’t uniquely identify a row of the  
> original query. [Two things I’m unsure about are a) how nulls affect  
> unique index  joins and b) how to deal with tables that aren’t  
> directly linked to the BaseTbl (i.e. they’re linked via an  
> intermediate table)].
>  4.  If the table is needed reintroduce it into the SQL.
>
> Do this in turn for each of the joins.
>
If step 3 is xxx-ed and only left-joins remain to be considered then  
the SQLite3 engine is likely to fullfill  this optimization in its  
next release.
I tested this on the current  (2017-11-17) pre-release snapshot. As  
far as I see any outer joined table may be discarded from the query  
plan if only rowid from the base table is selected. Using SELECT  
DISTINCT applies this also to one-to-many joins.


> I’m wondering if the second of those trunk changes is in any way  
> related to what I’m trying to do. The above is a bit long winded and  
> not easy to code so it would be great if the SQLite query optimizer  
> did it all for me.
>
It looks relevant, but I did not test that.

> To me the time taken to grab a grid page of data is negligible if  
> you know where to look for it on disc. I tend therefore to time  
> queries by how fast I can get all the BaseTbl RowIDs into a vector.  
> The biggest table in my database has 2.4 million rows and yet wait  
> cursors are a very rare sight.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon


From: E.Pasma<mailto:[hidden email]>
Sent: 26 November 2017 17:30
To: SQLite mailing list<mailto:[hidden email]>
Subject: Re: [sqlite] Getting an advance list of RowIDs for a query result set

>If step 3 is xxx-ed and only left-joins remain to be considered then
>the SQLite3 engine is likely to fullfill  this optimization in its
>next release.
>I tested this on the current  (2017-11-17) pre-release snapshot. As
>far as I see any outer joined table may be discarded from the query
>plan if only rowid from the base table is selected. Using SELECT
>DISTINCT applies this also to one-to-many joins.

Thanks E. Pasma. Fixing the left join optimisation will be enough for me and will allow me to delete a fair amount of difficult code. My confusion over the inner joins arose out of trying to find a way of ensuring the BaseTbl RowIDs wouldn’t contain any duplicates. Somewhere along the line I forgot SQLite would have to still check the record existed in the inner joined table.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

E.Pasma
x wrote:

>
> From: E.Pasma<mailto:[hidden email]>
> Sent: 26 November 2017 17:30
> To: SQLite mailing list<mailto:[hidden email]>
> Subject: Re: [sqlite] Getting an advance list of RowIDs for a query  
> result set
>
>> If step 3 is xxx-ed and only left-joins remain to be considered then
>> the SQLite3 engine is likely to fullfill  this optimization in its
>> next release.
>> I tested this on the current  (2017-11-17) pre-release snapshot. As
>> far as I see any outer joined table may be discarded from the query
>> plan if only rowid from the base table is selected. Using SELECT
>> DISTINCT applies this also to one-to-many joins.
>
> Thanks E. Pasma. Fixing the left join optimisation will be enough  
> for me and will allow me to delete a fair amount of difficult code.  
> My confusion over the inner joins arose out of trying to find a way  
> of ensuring the BaseTbl RowIDs wouldn’t contain any duplicates.  
> Somewhere along the line I forgot SQLite would have to still check  
> the record existed in the inner joined table.


Thanks to you, this topic has inspired a useful change or changes.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
>Thanks to you, this topic has inspired a useful change or changes.

Wow. Glad I could help.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

E.Pasma
So if I build a view that includes look-ups in other tables, the  
optimizer may skip these at places where not selected. However only if  
the look-ups are written as outer joins. Then it may be good practice  
allways doing that. For instance:

create view vtrack as
select  trackname, artistname
from track
left join artist ON trackartist=artistid -- note left join
;
~
~

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
>So if I build a view that includes look-ups in other tables, the
>optimizer may skip these at places where not selected. However only if
>the look-ups are written as outer joins. Then it may be good practice
>allways doing that. For instance:

>create view vtrack as
>select  trackname, artistname
>from track
>left join artist ON trackartist=artistid -- note left join
;
~
~

Yeah, I was thinking always to use left joins when it doesn’t make any difference. I’m using a C++ wrapper for SQLite I wrote myself. It contains a QueryGrid type that automatically splits the SQL into RowSQL and ColSQL as described earlier. It is still possible to build a vector of BaseTbl.RowIDs when the original query contains inner joins provided all columns of the unique index used in the join are satisfied (see ***Example). If it’s unable to do that (due to 1 to many joins)  it computes ‘select count(*) from .....’ and sets ColSQL to ‘select ColList from ... limit ?1, ?2’ so that it still fits in with the ‘fetch a range of data’ modus operandi and also allows a record counter and vertical scrollbar positioning.

***Example

select ColList
from BaseTbl
left join Tbl_1 on Tbl_1.X = BaseTbl.X
Inner join Tbl_2 on Tbl2.Y = BaseTbl.Y
left join Tbl_3 on Tbl3.Z = BaseTbl.Z
where BaseTbl.Col=?

Provided Tbl_2 has unique index on Y the RowID list could be had from

Select BaseTbl.RowID
from BaseTbl
left join Tbl_1 on Tbl_1.X = BaseTbl.X
Inner join Tbl_2 on Tbl2.Y = BaseTbl.Y
left join Tbl_3 on Tbl3.Z = BaseTbl.Z
where BaseTbl.Col=?

which the SQLite query optimiser would presumably reduce to

Select BaseTbl.RowID
from BaseTbl
Inner join Tbl_2 on Tbl2.Y = BaseTbl.Y
where BaseTbl.Col=?

I’ve still to write code to check the query contains no joins that are ‘1 to many’. I don’t suppose there’s an easy way of determining this from explain or explain query plan? At the moment all i can think of is to scrape the index name from the explain query plan Detail column, check the number of variables involved (i.e. number of ? marks) and compare with the number of columns in the index.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

E.Pasma

Op 27 nov 2017, om 20:51 heeft x het volgende geschreven:

>> So if I build a view that includes look-ups in other tables, the
>> optimizer may skip these at places where not selected. However only  
>> if
>> the look-ups are written as outer joins. Then it may be good practice
>> allways doing that. For instance:
>
>> create view vtrack as
>> select  trackname, artistname
>> from track
>> left join artist ON trackartist=artistid -- note left join
> ;
> ~
> ~
>
> Yeah, I was thinking always to use left joins when it doesn’t make  
> any difference. I’m using a C++ wrapper for SQLite I wrote myself.  
> It contains a QueryGrid type that automatically splits the SQL into  
> RowSQL and ColSQL as described earlier. It is still possible to  
> build a vector of BaseTbl.RowIDs when the original query contains  
> inner joins provided all columns of the unique index used in the  
> join are satisfied (see ***Example). If it’s unable to do that (due  
> to 1 to many joins)  it computes ‘select count(*) from .....’ and  
> sets ColSQL to ‘select ColList from ... limit ?1, ?2’ so that it  
> still fits in with the ‘fetch a range of data’ modus operandi and  
> also allows a record counter and vertical scrollbar positioning.
>
> ***Example
>
> select ColList
> from BaseTbl
> left join Tbl_1 on Tbl_1.X = BaseTbl.X
> Inner join Tbl_2 on Tbl2.Y = BaseTbl.Y
> left join Tbl_3 on Tbl3.Z = BaseTbl.Z
> where BaseTbl.Col=?
>
> Provided Tbl_2 has unique index on Y the RowID list could be had from
>
> Select BaseTbl.RowID
> from BaseTbl
> left join Tbl_1 on Tbl_1.X = BaseTbl.X
> Inner join Tbl_2 on Tbl2.Y = BaseTbl.Y
> left join Tbl_3 on Tbl3.Z = BaseTbl.Z
> where BaseTbl.Col=?
>
> which the SQLite query optimiser would presumably reduce to
>
> Select BaseTbl.RowID
> from BaseTbl
> Inner join Tbl_2 on Tbl2.Y = BaseTbl.Y
> where BaseTbl.Col=?
>
> I’ve still to write code to check the query contains no joins that  
> are ‘1 to many’. I don’t suppose there’s an easy way of determining  
> this from explain or explain query plan? At the moment all i can  
> think of is to scrape the index name from the explain query plan  
> Detail column, check the number of variables involved (i.e. number  
> of ? marks) and compare with the number of columns in the index.


What about changing the remaining inner join to left join

Select BaseTbl.RowID
from BaseTbl
left join Tbl_2 on Tbl2.Y = BaseTbl.Y
where BaseTbl.Col=?

and see if the SQLiter optimizer now leaves Tbl_2 out from the query  
plan.  It will only do that if it is not a 1-to-n join.
But that leaves you with the change that the join is 1-to-0 so to say.  
I understood that was a show stopper.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
>What about changing the remaining inner join to left join

>Select BaseTbl.RowID
>from BaseTbl
>left join Tbl_2 on Tbl2.Y = BaseTbl.Y
>where BaseTbl.Col=?

>and see if the SQLiter optimizer now leaves Tbl_2 out from the query
>plan.  It will only do that if it is not a 1-to-n join.

If Tbl_2 isn’t involved in the columns, where or order by then changing it to left join will mean it will definitely be left out so I don’t get what you mean E.Pasma.


I’ve now realised what caused my earlier confusion regarding inner joins. I was getting mixed up with foreign keys which do offer a way of finding redundant inner joins.

If a Tbl (which is not involved in the query columns, where or order by) is inner joined to BaseTbl on all the columns of a unique index and it turns out there is a foreign key matching that join then the Tbl can be left out if the BaseTbl columns are defined as NOT NULL. If they are allowed to be NULL then it can still be left out provided the NOT NULL condition(s) is added to the where.

Using the earlier example

Select BaseTbl.RowID
from BaseTbl
left join Tbl_1 on Tbl_1.X = BaseTbl.X
Inner join Tbl_2 on Tbl2.Y = BaseTbl.Y
left join Tbl_3 on Tbl3.Z = BaseTbl.Z
where BaseTbl.Col=?

If BaseTbl has FOREIGN KEY(Y) REFERENCES Tbl_2(Y) then

If BaseTbl.Y is defined as NOT NULL the query can be reduced to

Select BaseTbl.RowID
from BaseTbl
where BaseTbl.Col=?

If BaseTbl.Y allows NULLs then the query can be reduced to

Select BaseTbl.RowID
from BaseTbl
where BaseTbl.Col=? and BaseTbl.Y IS NOT NULL

Could someone confirm I’ve got the above right?

If I’m correct, I wonder if the optimiser takes (or could take) this into account. As lookup tables are probably the main use of foreign keys you’d think there’d be no shortage of such joins. That said, I suppose if you’re aware of what you want the optimiser to do it wouldn’t offer any advantage over left joining and adding the appropriate NOT NULL constraint(s) to the WHERE.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
E.Pasma wrote

>> What about changing the remaining inner join to left join
>
>> Select BaseTbl.RowID
>> from BaseTbl
>> left join Tbl_2 on Tbl2.Y = BaseTbl.Y
>> where BaseTbl.Col=?
>
>> and see if the SQLiter optimizer now leaves Tbl_2 out from the query
>> plan.  It will only do that if it is not a 1-to-n join.

I replied

> If Tbl_2 isn’t involved in the columns, where or order by then  
> changing it to left join will mean it will definitely be left out so  
> I don’t get what you mean E.Pasma.

I’m talking bollocks again. The left join will only be omitted if it has a 1
to 1 relationship with BaseTbl so E.Pasma is correct.


I’m now wondering if you omit the WHERE & ORDER BY and run the following

EXPLAIN QUERY PLAN
SELECT BaseTbl.RowID
FROM BaseTbl
left join Tbl1 on comparison_1
left join Tbl2 on comparison_2
.
.
left join Tbln on comparison_n

then if it returns more than 1 row then this implies there’s a 1 to many
relationship in the query and we can’t proceed.  (At this stage I’m by no
means sure of this).

Otherwise the RowSQL becomes

SELECT BaseTbl.RowID
FROM BaseTbl
jointype_1 Tbl1 on comparison_1
jointype_2 Tbl2 on comparison_2
.
.
jointype_n Tbln on comparison_n
WHERE ...
ORDER BY ...

And we leave the SQLite optimiser to filter out any left joins that aren’t
required.






--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Getting an advance list of RowIDs for a query result set

curmudgeon
*I’m now wondering if you omit the WHERE & ORDER BY and run the following

EXPLAIN QUERY PLAN
SELECT BaseTbl.RowID
FROM BaseTbl
left join Tbl1 on comparison_1
left join Tbl2 on comparison_2
.
.
left join Tbln on comparison_n

then if it returns more than 1 row then this implies there’s a 1 to many
relationship in the query and we can’t proceed.  (At this stage I’m by no
means sure of this).
*

The above isn't guaranteed to work. Consider the following

CREATE TABLE t1(a INTEGER PRIMARY KEY, b INTEGER);
CREATE TABLE t2(x INTEGER PRIMARY KEY, y INTEGER);
CREATE INDEX ib ON t1(b);
EXPLAIN QUERY PLAN SELECT t1.RowID FROM t1 LEFT JOIN t2 ON x = b;
0|0|0|SCAN TABLE t1

Everything above is as expected, t2.x is primary key lookup for t1.b so
trailing left join is dropped. Suppose though we made t2.y the lookup for
t1.b

CREATE UNIQUE INDEX iy ON t2(y);
EXPLAIN QUERY PLAN SELECT t1.RowID FROM t1 LEFT JOIN t2 ON y = b;
0|0|0|SCAN TABLE t1
0|1|1|SEARCH TABLE t2 USING COVERING INDEX iy (y=?)

As you can see what at first sight looks like a redundant trailing left join
ISN’T dropped. If however we had defined t2 with

CREATE TABLE t2(x INTEGER PRIMARY KEY, y INTEGER NOT NULL);

Then we would get

EXPLAIN QUERY PLAN SELECT t1.RowID FROM t1 LEFT JOIN t2 ON y = b;
0|0|0|SCAN TABLE t1

Therefore, the suggested formula would fail on any non premium key lookup
index where the index field(s) were not declared as NOT NULL.

Strangely (or maybe not)

EXPLAIN QUERY PLAN SELECT t1.RowID FROM t1 LEFT JOIN t2 ON y = b *WHERE y IS
NOT NULL*;

doesn't resolve the problem.




--
Sent from: http://sqlite.1065341.n5.nabble.com/
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users