Rewriting a query

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Rewriting a query

Hugh Gibson
I have this query:

  SELECT sCommunityID, max(sTransactionID)
  FROM TransactionList
  WHERE sCommunityID in ('a030600001bFi','a030600004KDy', 'a030600008QhK')
  GROUP BY sCommunityID
               
There is an index on (sCommunityID, sTransactionID)

This forces a table scan (perhaps improved in 3.2.6).

I can use a LIMIT 1 clause like this, but only with one community ID:

  SELECT sCommunityID, sTransactionID
  FROM TransactionList
  WHERE sCommunityID  = 'a030600001bFi'
  ORDER BY sTransactionID Desc LIMIT 1

Is there a way of getting the latter to work with multiple sCommunityIDs?

Hugh
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

Hugh Gibson
The following works for me. The Community table has only one entry per
community ID so it's fast to look up.

SELECT sCommunityID, (SELECT sTransactionID
                            FROM TransactionList
                            WHERE sCommunityID = Community.sCommunityID
                            ORDER BY sTransactionID Desc Limit 1) as MaxID
From Community WHERE sCommunityID In ('a030600001bFi','a030600004KDy',
'a030600008QhK')

Is there any way of just supplying a list of records without having to use
another table? I know I can use parameterised queries but I would like to
be able to use a single query to get the data, as it feels like it should
be possible!

Hugh
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

Dan Kennedy
In reply to this post by Hugh Gibson

What happens if you create the index on sCommunityID only? Does
it still do the full table scan?

Also, don't overlook using UNION or UNION ALL, ugly as they
can be. Maybe something like this could be used to avoid creating
a very small temporary table:

SELECT ... FROM (SELECT ... UNION SELECT ... UNION SELECT ...), TransactionList WHERE ...

--- Hugh Gibson <[hidden email]> wrote:

> I have this query:
>
>   SELECT sCommunityID, max(sTransactionID)
>   FROM TransactionList
>   WHERE sCommunityID in ('a030600001bFi','a030600004KDy', 'a030600008QhK')
>   GROUP BY sCommunityID
>                
> There is an index on (sCommunityID, sTransactionID)
>
> This forces a table scan (perhaps improved in 3.2.6).
>
> I can use a LIMIT 1 clause like this, but only with one community ID:
>
>   SELECT sCommunityID, sTransactionID
>   FROM TransactionList
>   WHERE sCommunityID  = 'a030600001bFi'
>   ORDER BY sTransactionID Desc LIMIT 1
>
> Is there a way of getting the latter to work with multiple sCommunityIDs?
>
> Hugh


               
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

Hugh Gibson
> What happens if you create the index on sCommunityID only? Does
> it still do the full table scan?

A bit difficult to drop the (sCommunityID, sTransactionID) index, as it's
the primary key.

> Also, don't overlook using UNION or UNION ALL, ugly as they
> can be. Maybe something like this could be used to avoid creating
> a very small temporary table:
>
> SELECT ... FROM (SELECT ... UNION SELECT ... UNION SELECT ...),
> TransactionList WHERE ...

Hmmm. Something like this works:

SELECT sCommID, (SELECT sTransactionID
                            FROM TransactionList
                            WHERE sCommunityID = sCommID
                            ORDER BY sTransactionID Desc Limit 1) as MaxID
From (SELECT 'a030600001bFi' As sCommID UNION ALL SELECT 'a030600004KDy'
As sCommID UNION ALL SELECT 'a030600008QhK' As sCommID)

The "explain" output shows that the Community table isn't being touched
now, and in fact it creates a temp table with the hard-coded values in it,
similar to "IN". So the final query may run faster (it's certainly very
fast with the data I have).

I might be working with a few thousand communities. Is there a limit to
the size of queries? In that case the "IN" clause would be better as it
uses less text.

Hugh
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

Robin Breathe
Hugh Gibson wrote:
>> What happens if you create the index on sCommunityID only? Does
>> it still do the full table scan?
>
> A bit difficult to drop the (sCommunityID, sTransactionID) index, as it's
> the primary key.

I'm intrigued. How do you get SQLite to use a multi-column index as it's
primary key (i.e. B-tree hash)? Please elaborate.

Have you investigated the following to see how the optimizer deals with it?

SELECT sCommunityID, max(sTransactionID)
  FROM TransactionList
 GROUP BY sCommunityID
HAVING sCommunityID IN ('a030600001bFi','a030600004KDy', 'a030600008QhK');

I think I'd normally err on keeping a filter table handy (temporary if
you wish):

-- Initialise filter:
CREATE TABLE filter (sCommunityID TEXT);
CREATE INDEX filter_idx ON filter(sCommunityID);

-- Configure filter:
INSERT INTO filter VALUES ('a030600001bFi');
INSERT INTO filter VALUES ('a030600004KDy');
INSERT INTO filter VALUES ('a030600008QhK');

-- Get your results:
SELECT sCommunityID, max(sTransactionID)
  FROM filter NATURAL INNER JOIN TransactionList
 GROUP BY sCommunityID;

-- Clear filter ready for next time:
DELETE FROM filter;

I feel fairly confident that that method would make good use of your
existing index.
You can use a nice sqlite3_bind_text() for those inserts :)
If you really, really wanted you could munge the filter table into a
sequence of UNIONs, but it would be nowhere near as elegant. This method
easily extends to any number of filtered sCommunityIDs, while anything
in a single query is going to start getting "icky".

Probably too late, but I've also found that indexing, grouping and
joining are far faster on INT columns. If you could store sCommunityID
as an INT...

Please correct me if any of the above is junk, I'm still learning :)

Regards,
Robin
--
Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK
[hidden email]       Tel: +44 1865 483685  Fax: +44 1865 483073


signature.asc (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

Hugh Gibson
> I'm intrigued. How do you get SQLite to use a multi-column index as it's
> primary key (i.e. B-tree hash)? Please elaborate.

Simply

CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT
'',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY
(sCommunityID, sTransactionID))

> Have you investigated the following to see how the optimizer deals with
> it?
>
> SELECT sCommunityID, max(sTransactionID)
>   FROM TransactionList
>  GROUP BY sCommunityID
> HAVING sCommunityID IN ('a030600001bFi','a030600004KDy',
> 'a030600008QhK')

I'm not fluent in the VDBE code, but ISTM (and execution time confirms)
that it's doing a table scan first to do the GROUP BY. Then it
creates a temp table with the values from the list, and does a join (I
suppose).

> I think I'd normally err on keeping a filter table handy (temporary if
> you wish):
>
> -- Initialise filter:
> CREATE TABLE filter (sCommunityID TEXT);
> CREATE INDEX filter_idx ON filter(sCommunityID);
>
> -- Configure filter:
> INSERT INTO filter VALUES ('a030600001bFi');
> INSERT INTO filter VALUES ('a030600004KDy');
> INSERT INTO filter VALUES ('a030600008QhK');
>
> -- Get your results:
> SELECT sCommunityID, max(sTransactionID)
>   FROM filter NATURAL INNER JOIN TransactionList
>  GROUP BY sCommunityID;
>
> -- Clear filter ready for next time:
> DELETE FROM filter;

Thread safety is an issue here. Multiple threads may be doing this action.
Also, by the (admittedly contorted) query I am using I can get the VDBE to
create a temporary table for me and insert the data, taking a lot less
time to do it than it would take to run all those queries.
 
> I feel fairly confident that that method would make good use of your
> existing index.

Yes, I'm sure too. See my comment about joining to the Community table
(that still requires the list of values, but it's fast because of the way
the query is constructed).

> You can use a nice sqlite3_bind_text() for those inserts :)

I'm using Python and PySQLite but I can do that through executemany.

> If you really, really wanted you could munge the filter table into a
> sequence of UNIONs, but it would be nowhere near as elegant. This method
> easily extends to any number of filtered sCommunityIDs, while anything
> in a single query is going to start getting "icky".

Yes, hence my question about the max size of queries. It wouldn't be a
problem to create multiple big queries and run them, as the running time
is still O(N). It's just how long the query engine takes to compile the
SQL if it's very long.
 
> Probably too late, but I've also found that indexing, grouping and
> joining are far faster on INT columns. If you could store sCommunityID
> as an INT...

Interesting. It is, indeed, way too late :-)
 
> Please correct me if any of the above is junk, I'm still learning :)

If we stop learning then someone should call the undertaker ;-)

Hugh
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

Robin Breathe
Hugh Gibson wrote:
>> I'm intrigued. How do you get SQLite to use a multi-column index as it's
>> primary key (i.e. B-tree hash)? Please elaborate.
>
> Simply
>
> CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT
> '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY
> (sCommunityID, sTransactionID))

Ah, but it's not used for the B-tree hash (at least not according to the
documentation).

>> Have you investigated the following to see how the optimizer deals with
>> it?
> ...snip...
> I'm not fluent in the VDBE code, but ISTM (and execution time confirms)
> that it's doing a table scan first to do the GROUP BY. Then it
> creates a temp table with the values from the list, and does a join (I
> suppose).

I thought it probably would, but worth trying :)

>> I think I'd normally err on keeping a filter table handy (temporary if
>> you wish):
>>
>> -- Initialise filter:
>> CREATE TABLE filter (sCommunityID TEXT);
>> CREATE INDEX filter_idx ON filter(sCommunityID);
>>
>> -- Configure filter:
>> INSERT INTO filter VALUES ('a030600001bFi');
>> INSERT INTO filter VALUES ('a030600004KDy');
>> INSERT INTO filter VALUES ('a030600008QhK');
>>
>> -- Get your results:
>> SELECT sCommunityID, max(sTransactionID)
>>   FROM filter NATURAL INNER JOIN TransactionList
>>  GROUP BY sCommunityID;
>>
>> -- Clear filter ready for next time:
>> DELETE FROM filter;
>
> Thread safety is an issue here. Multiple threads may be doing this action.
> Also, by the (admittedly contorted) query I am using I can get the VDBE to
> create a temporary table for me and insert the data, taking a lot less
> time to do it than it would take to run all those queries.
You didn't mention threads :)

Robin
--
Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK
[hidden email]       Tel: +44 1865 483685  Fax: +44 1865 483073


signature.asc (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

D. Richard Hipp
In reply to this post by Hugh Gibson
Robin Breathe <[hidden email]> wrote:

> Hugh Gibson wrote:
> >> I'm intrigued. How do you get SQLite to use a multi-column index as it's
> >> primary key (i.e. B-tree hash)? Please elaborate.
> >
> > Simply
> >
> > CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT
> > '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY
> > (sCommunityID, sTransactionID))
>
> Ah, but it's not used for the B-tree hash (at least not according to the
> documentation).
>

The ROWID is always used as the btree key (not hash!) on the main
database btree.  But when you have a PRIMARY KEY a separate index
btree is also created which uses the PRIMARY KEY as its key.

Question:  Why is this important to you?

(Side note: I am experimenting with a new Mail User Agent that
uses SQLite to store all its email messages.  I appologize in advance
if this message is misformatted or otherwise garbled.)
--
D. Richard Hipp <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Rewriting a query

Miha Vrhovnik
In reply to this post by Hugh Gibson
[hidden email] je ob 30.9.2005 12:22:47 napisal(a):

>(Side note: I am experimenting with a new Mail User Agent that
>uses SQLite to store all its email messages.  I appologize in advance
>if this message is misformatted or otherwise garbled.)

Am. Who stole that idea from me?

--
It's time to get rid of your current e-mail client ...
... and start using si.Mail.

http://simail.sourceforge.net/
Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

Dan Kennedy
In reply to this post by D. Richard Hipp


--- [hidden email] wrote:

> Robin Breathe <[hidden email]> wrote:
> > Hugh Gibson wrote:
> > >> I'm intrigued. How do you get SQLite to use a multi-column index as it's
> > >> primary key (i.e. B-tree hash)? Please elaborate.
> > >
> > > Simply
> > >
> > > CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT
> > > '',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY
> > > (sCommunityID, sTransactionID))
> >
> > Ah, but it's not used for the B-tree hash (at least not according to the
> > documentation).
> >
>
> The ROWID is always used as the btree key (not hash!) on the main
> database btree.  But when you have a PRIMARY KEY a separate index
> btree is also created which uses the PRIMARY KEY as its key.
>
> Question:  Why is this important to you?

I am not he, but if I were looking for ways to improve the file format
that's something I'd try to work in too. Both to save space and speed
things up.

CREATE TABLE(a PRIMARY KEY, b);

creates two btree structures:

(a || oid) -> NULL    (the index)
(oid) -> (a || b)     (the table)

So for every row, there are two copies of both "a" and "oid". Depending
on your schema, the space consumed by the table is from 0-100% more than
if we were able to create a single btree:

(a || oid) -> (b)

or even drop the oid altogether, it's not part of SQL anyway (is it?):

(a) -> (b)

As well as saving space, in the most common case an UPDATE or DELETE
would have to modify one less tree, and some SELECTs would open one
less tree structure.

Tricky to retain backward compatibility though.


               
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Rewriting a query

D. Richard Hipp
In reply to this post by Hugh Gibson
"Miha Vrhovnik"<[hidden email]> wrote:
> Subject: =?ISO-8859-1?Q?Re=3A=20Re=3A=20=5Bsqlite=5D=20Rewriting=20a=20query?=
>
...

> It's time to get rid of your current e-mail client ...
> ... and start using si.Mail.
>
> http://simail.sourceforge.net/

What RFC do I need to read to figure out how to decode the Subject
line (presumably inserted by si.Mail)?
--
D. Richard Hipp <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Rewriting a query

Igor Tandetnik
[hidden email] wrote:
> "Miha Vrhovnik"<[hidden email]> wrote:
>> Subject:
>> =?ISO-8859-1?Q?Re=3A=20Re=3A=20=5Bsqlite=5D=20Rewriting=20a=20query?=
>
> What RFC do I need to read to figure out how to decode the Subject
> line (presumably inserted by si.Mail)?

RFC 2047 "MIME (Multipurpose Internet Mail Extensions) Part Three:
Message Header Extensions for Non-ASCII Text."

Igor Tandetnik

Reply | Threaded
Open this post in threaded view
|

Re: Rewriting a query

John LeSueur
In reply to this post by D. Richard Hipp
[hidden email] wrote:

>Robin Breathe <[hidden email]> wrote:
>  
>
>>Hugh Gibson wrote:
>>    
>>
>>>>I'm intrigued. How do you get SQLite to use a multi-column index as it's
>>>>primary key (i.e. B-tree hash)? Please elaborate.
>>>>        
>>>>
>>>Simply
>>>
>>>CREATE TABLE TransactionList (sTransactionID Text(13) DEFAULT
>>>'',sCommunityID Text(13) DEFAULT '',sObject Text(13) , PRIMARY KEY
>>>(sCommunityID, sTransactionID))
>>>      
>>>
>>Ah, but it's not used for the B-tree hash (at least not according to the
>>documentation).
>>
>>    
>>
>
>The ROWID is always used as the btree key (not hash!) on the main
>database btree.  But when you have a PRIMARY KEY a separate index
>btree is also created which uses the PRIMARY KEY as its key.
>
>Question:  Why is this important to you?
>
>(Side note: I am experimenting with a new Mail User Agent that
>uses SQLite to store all its email messages.  I appologize in advance
>if this message is misformatted or otherwise garbled.)
>--
>D. Richard Hipp <[hidden email]>
>
>  
>
I thought you might want to know that your new agent doesn't add
References: <message-id> or In-Reply-To: <message-id>.
This is not a big deal, but it does break discussion threading a little.

John