Min/Max and skip-scan optimizations

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
On 4 Feb 2019, at 9:14pm, James K. Lowden <[hidden email]> wrote:

> As Keith said, SQLite allows ORDER BY in subqueries.  The SQL standard does not.  

True.  But SQLite does not guarantee that the outer query will preserve the inner query's ORDER BY, even if the outer query doesn't have its own ORDER BY.  So be careful.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
Thank you for your explanations guys. All this makes perfect sense.
I still can't find a solution to my problem though -- write a query that is
guaranteed to return sorted results, in some optimal way.

Any suggestion welcome.

Thank you,
Gerlando

Il lun 4 feb 2019, 22:24 Simon Slavin <[hidden email]> ha scritto:

> On 4 Feb 2019, at 9:14pm, James K. Lowden <[hidden email]>
> wrote:
>
> > As Keith said, SQLite allows ORDER BY in subqueries.  The SQL standard
> does not.
>
> True.  But SQLite does not guarantee that the outer query will preserve
> the inner query's ORDER BY, even if the outer query doesn't have its own
> ORDER BY.  So be careful.
>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
On 5 Feb 2019, at 8:00am, Gerlando Falauto <[hidden email]> wrote:

> Thank you for your explanations guys. All this makes perfect sense.
> I still can't find a solution to my problem though -- write a query that is guaranteed to return sorted results, in some optimal way.

Please state your table definition, and desired query including ORDER BY clause.  Please also tell us whether the amount of space taken up by your database file is important.  Then we will tell you how to make SQLite use an efficient way to arrive at your desired result.

It's not possible for us to just tell you the answer.  This will involve a few steps on your part of testing.  So if you aren't already using the SQLite command-line tool, please install it and learn to run it.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Rowan Worth-2
On Tue, 5 Feb 2019 at 16:06, Simon Slavin <[hidden email]> wrote:

> On 5 Feb 2019, at 8:00am, Gerlando Falauto <[hidden email]>
> wrote:
>
> > Thank you for your explanations guys. All this makes perfect sense.
> > I still can't find a solution to my problem though -- write a query that
> is guaranteed to return sorted results, in some optimal way.
>
> Please state your table definition, and desired query including ORDER BY
> clause.  Please also tell us whether the amount of space taken up by your
> database file is important.  Then we will tell you how to make SQLite use
> an efficient way to arrive at your desired result.
>

The table definition was literally the first thing in Gerlando's initial
email, and the desired query has also been clarified. But I assume you
didn't actually read the thread before commenting; if you had you would
have also noticed that Gerlando was the first person to note that it isn't
reliable to depend on the order of results coming out of a SELECT which
doesn't have an ORDER BY clause.

IMO it would be great if we could all move on from that well established
fact and focus on the issue Gerlando is trying to raise. We have this query:

SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
  AND ts > 1 AND ts < 100000000
ORDER BY source1, source2, ts;

And this index:

CREATE INDEX `sources` ON `rolling` (
    `source1`,
    `source2`,
    `ts`
);

What is stopping sqlite's query planner from taking advantage of the index,
which it has chosen to use for the query, to also satisfy the ORDER BY?
Instead adds an extra TEMP B-TREE step to sort the results, which slows
things down. Intuitively it seems there's a potential for optimisation
here. Which doesn't mean it's feasible, but it would be a pretty good win
to be able to provide ORDER BY for free in more circumstances so it's worth
considering.

Gerlando, what version of sqlite are you using?

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
Hi Rowan,

thank you for your kind support. You grasped the essence of my questions.
:-)
I'm using SQLite 3.25.00.

Thank you,
Gerlando




On Tue, Feb 5, 2019 at 9:59 AM Rowan Worth <[hidden email]> wrote:

> On Tue, 5 Feb 2019 at 16:06, Simon Slavin <[hidden email]> wrote:
>
> > On 5 Feb 2019, at 8:00am, Gerlando Falauto <[hidden email]>
> > wrote:
> >
> > > Thank you for your explanations guys. All this makes perfect sense.
> > > I still can't find a solution to my problem though -- write a query
> that
> > is guaranteed to return sorted results, in some optimal way.
> >
> > Please state your table definition, and desired query including ORDER BY
> > clause.  Please also tell us whether the amount of space taken up by your
> > database file is important.  Then we will tell you how to make SQLite use
> > an efficient way to arrive at your desired result.
> >
>
> The table definition was literally the first thing in Gerlando's initial
> email, and the desired query has also been clarified. But I assume you
> didn't actually read the thread before commenting; if you had you would
> have also noticed that Gerlando was the first person to note that it isn't
> reliable to depend on the order of results coming out of a SELECT which
> doesn't have an ORDER BY clause.
>
> IMO it would be great if we could all move on from that well established
> fact and focus on the issue Gerlando is trying to raise. We have this
> query:
>
> SELECT source1, source2, ts, value
> FROM rolling
> WHERE source1 = 'aaa'
>   AND ts > 1 AND ts < 100000000
> ORDER BY source1, source2, ts;
>
> And this index:
>
> CREATE INDEX `sources` ON `rolling` (
>     `source1`,
>     `source2`,
>     `ts`
> );
>
> What is stopping sqlite's query planner from taking advantage of the index,
> which it has chosen to use for the query, to also satisfy the ORDER BY?
> Instead adds an extra TEMP B-TREE step to sort the results, which slows
> things down. Intuitively it seems there's a potential for optimisation
> here. Which doesn't mean it's feasible, but it would be a pretty good win
> to be able to provide ORDER BY for free in more circumstances so it's worth
> considering.
>
> Gerlando, what version of sqlite are you using?
>
> -Rowan
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
In reply to this post by Rowan Worth-2
On 5 Feb 2019, at 8:59am, Rowan Worth <[hidden email]> wrote:

> SELECT source1, source2, ts, value
> FROM rolling
> WHERE source1 = 'aaa'
>  AND ts > 1 AND ts < 100000000
> ORDER BY source1, source2, ts;
>
> And this index:
>
> CREATE INDEX `sources` ON `rolling` (
>    `source1`,
>    `source2`,
>    `ts`
> );
>
> What is stopping sqlite's query planner from taking advantage of the index, which it has chosen to use for the query, to also satisfy the ORDER BY?

I suspect that, given the data in the table, the index supplied is not optimal for selecting the correct rows from the table.  SQLite may have decided that it needs to select on the contents of ts first, then source1.

1) Put some typical data in the table.  The table should have an appropriate number of rows, and appropriate values in each field.
2) Add an index for (ts, source1, source2).
3) Add an index for (source1, ts, source2, value).
4) Add an index for (ts, source1, source2, value).
5) Run ANALYZE
6) Use EXPLAIN QUERY PLAN on the above SELECT to find which index SQLite chose.

You now know what SQLite thinks is the best index for the query.  You can delete the other indexes and run VACUUM.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

R Smith-2


On 2019/02/05 4:46 PM, Simon Slavin wrote:
> On 5 Feb 2019, at 8:59am, Rowan Worth <[hidden email]> wrote:
>
>>
>> What is stopping sqlite's query planner from taking advantage of the index, which it has chosen to use for the query, to also satisfy the ORDER BY?
> I suspect that, given the data in the table, the index supplied is not optimal for selecting the correct rows from the table.  SQLite may have decided that it needs to select on the contents of ts first, then source1.

And to add to this:

An Index is nothing magical and not a save-the-World-from-every-monster
type of device (as newer DB programmers often think). It's an expensive
add-on that provides an ordered binary lookup which, given enough bulk,
will eventually win the efficiency race over the extra computation it
adds. (The more bulk, the more win).
(Some DB programmers, when they see the words "table scan" in any Query
plan, immediately feel as if they have somehow failed to correctly
optimize the query. This is silly - a table scan is often the most
optimal solution).

Add to that the fact that an SQLite TABLE is, in and of itself, nothing
less than a covering Index with row_id as a key (or a custom key for
WITHOUT ROWID tables), and as such it is a rather good Index and a
mostly preferred Index by the query planner (because "using" any other
index adds cycles plus an extra row_id lookup). Due to this, scanning
the table is often more efficient than threading a lookup via another
index into the query plan. Sometimes crafting a new temp BTree Index for
(a) specific field(s) on a materialized set of data might also be judged
faster than re-establishing links between said data and its original Index.

The method by which the query planner decides which other Index (if any)
should be used involves a bit of game theory, typically looking at some
ANALYZE result data along with with some tried and tested weights in the
decision tree (which I'm not going into since A - It's not important,
and B - I don't know enough of how SQLite does it). If the end score
finds that there is no remarkable advantage to using a separate index,
then it WILL opt to use the more-efficient table scan.

It might be that the adding of the "ORDER BY" simply pushes one such
decision weight over the edge in this use case, and, once the table data
evolved to be more complex or hefty, it may again turn to the Index.

To add to another poster's comment: Do not second-guess the
Query-planner, leave it to its devices. You may even be able to
construct a scenario where the specific use case causes the QP to choose
an execution path that is slightly slower than an alternate one, but if
it is looked at in the general case, then other similar query scenarios
might again be faster with that chosen path. Further to this, if you
construct a weird query now to force a path of execution with some gain,
you possibly prohibit it from capitalizing on an even better improvement
that might be inherent to the next SQLite update (possibly thanks to
your very own report here).

If you can demonstrate a true degradation (one that slows down a
significant time slice that trespasses on human-perceptible time) for a
general query, an optimization will surely be considered, but this case,
unless I've misunderstood the severity, does not seem to warrant that.

[PS: this is not a discouragement, it's great to hear of every possible
quirk and make other users aware of a possible query scenario that might
not be optimal - thanks for that, and I'm certain the devs would notice
this, perhaps even get on fixing it right away, or maybe only keep it in
the back of their minds for when the next round of query-planner
refinement happens. I'm simply saying that there is possibly no
satisfying answer to your question right now - best we can do is:
"Sometimes the QP correctly evaluates the best path to be one that is
not obviously best to us, or maybe even worse for a specific case, but
typically better in the general case".]


Cheers,
Ryan


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
Hi Ryan,

first of all thank you for your patience and contribution.

[....]

>
> Add to that the fact that an SQLite TABLE is, in and of itself, nothing
> less than a covering Index with row_id as a key (or a custom key for
> WITHOUT ROWID tables), and as such it is a rather good Index and a
> mostly preferred Index by the query planner (because "using" any other
> index adds cycles plus an extra row_id lookup).


That's quite interesting indeed.
I actually started off with source1,source2,ts as the primary key and for
some reason (which I no longer remember) I thought it would be wise to use
a ROWID and add an index instead.
Please bear in mind this table is nothing more than a continuous log with a
two-columned source identifier and a timestamp.
The use case involves retaining as much data as the storage can possibly
hold (so a bunch of gigabytes).
I could've just used directories and logfiles instead of abusing a
relational database but I just thought it would more convenient to issue a
query and use a cursor.

Due to this, scanning
> the table is often more efficient than threading a lookup via another
> index into the query plan. Sometimes crafting a new temp BTree Index for
> (a) specific field(s) on a materialized set of data might also be judged
> faster than re-establishing links between said data and its original Index.
>

Do you think restoring the original primary key (instead of ROWID) and
dropping the index would make any difference?

>
> The method by which the query planner decides which other Index (if any)
> should be used involves a bit of game theory, typically looking at some
> ANALYZE result data along with with some tried and tested weights in the
> decision tree (which I'm not going into since A - It's not important,
> and B - I don't know enough of how SQLite does it). If the end score
> finds that there is no remarkable advantage to using a separate index,
> then it WILL opt to use the more-efficient table scan.
>

I pre-populated the table with a realistic use case scenario and ran
ANALYZE.
I'm not planning on using ANALYZE on the real system -- though I might
indeed pre-populate sqlite_stat1 with typical values as suggested in the
docs.

It might be that the adding of the "ORDER BY" simply pushes one such
> decision weight over the edge in this use case, and, once the table data
> evolved to be more complex or hefty, it may again turn to the Index.
>

> To add to another poster's comment: Do not second-guess the
> Query-planner, leave it to its devices. You may even be able to
> construct a scenario where the specific use case causes the QP to choose
> an execution path that is slightly slower than an alternate one, but if
> it is looked at in the general case, then other similar query scenarios
> might again be faster with that chosen path. Further to this, if you
> construct a weird query now to force a path of execution with some gain,
> you possibly prohibit it from capitalizing on an even better improvement
> that might be inherent to the next SQLite update (possibly thanks to
> your very own report here).
>
>
I could not agree more.



> If you can demonstrate a true degradation (one that slows down a
> significant time slice that trespasses on human-perceptible time) for a
> general query, an optimization will surely be considered, but this case,
> unless I've misunderstood the severity, does not seem to warrant that.
>

Yes, in the worst case, adding the ORDER BY clause (2 vs.1, 4 vs.3) leads
to a perceivable degradation in terms of both seek time (several seconds
vs. milliseconds to get the first row) and occupied disk space.

Keith's approach (5) seems to somehow mitigate the effect by only adding
some +33% execution time (40s vs. 30s) at the cost of a "weird" query.
A query which I would never have thought of by myself, and whose query plan
I don't quite understand.


> [PS: this is not a discouragement, it's great to hear of every possible
> quirk and make other users aware of a possible query scenario that might
> not be optimal - thanks for that, and I'm certain the devs would notice
> this, perhaps even get on fixing it right away, or maybe only keep it in
> the back of their minds for when the next round of query-planner
> refinement happens. I'm simply saying that there is possibly no
> satisfying answer to your question right now - best we can do is:
> "Sometimes the QP correctly evaluates the best path to be one that is
> not obviously best to us, or maybe even worse for a specific case, but
> typically better in the general case".]
>

As I already said, my use case *is* quite unusual. Definitely not something
you'd normally use a relational database for.
So I'm not surprised it's not the use case SQLite is optimized against.

Thank you everyone for contributing to this conversation!
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
On 5 Feb 2019, at 10:12pm, Gerlando Falauto <[hidden email]> wrote:

> I actually started off with source1,source2,ts as the primary key and for some reason (which I no longer remember) I thought it would be wise to use a ROWID and add an index instead.

That is probably the right solution.  There are reasons to keep the primary key very short, and ROWID is about as short as a SQLite value can be.

> [...]
>
> I pre-populated the table with a realistic use case scenario and ran ANALYZE.
> I'm not planning on using ANALYZE on the real system -- though I might indeed pre-populate sqlite_stat1 with typical values as suggested in the docs.

Having run ANALYZE on your 'typical' data, you can copy the resulting sqlite_stat1 table onto production systems, even though the tables it refers to are empty.  ANALYZE not only analyzes the contents of your tables but also your indexes.  If you add or drop indexes, it's useful to run ANALYZE again.

ANALYZE is the number 1 fastest easiest tool to provide SQLite with the information it needs to pick the best query plan.  As a human you can't possibly hope to match it.  If your system has a yearly maintenance procedure, it's good to include ANALYZE.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Keith Medcalf
In reply to this post by Gerlando Falauto

On Tuesday, 5 February, 2019 15:12, Gerlando Falauto wrote:

> I could've just used directories and logfiles instead of abusing
> a relational database but I just thought it would more convenient
> to issue a query and use a cursor.

Well, the "abusing a relational database" is the correct terminology because you have not normalized the data before storing it.  Failure to normalize causes all sorts of problems when you attempt to "retrieve" and "store" data -- if the data is not normalized you should not really expect anything more performant than would be obtained by simply sequentially scanning sequential log files.  Also, relational databases do not have cursors (cursors are an illusion).  Relational databases operate on "sets" of data all at one get-go.  The appearance of "sequential execution row by row" is merely an artifice of the current limits of computer technology, and what some refer to a "cursors" are merely implementation details or "magic programming" allowing access to a "set" in a row by row fashion.

You probably want separate tables for "source1" and for "source2" because these are not keys of the events, they are attributes of the events, and contain duplicate data.  Similarly the timestamp is an attribute of an event, not a key of an event.  The only true key of the event is a pseudo-key conveying the order in which the events happened and nothing more (for which a simple record number or "integer primary key") will suffice.

Now you have to decide whether or not "source2" is dependant on "source1" or not.  This is rather simple.  If the same "source2" can occur in combination with several "source1" then it is independant.  (This also depends on what you want to do with the data after you have it.)

So now you have the following minimally normalized schema:

create table Source1
(
  id integer primary key,
  name text collate nocase unique
);
create table Source2
(
  id integer primary key,
  name text collate nocase unique
);
create table Events
(
  id integer primary key,
  Source1id integer not null references Source1,
  Source2id integer not null references Source2,
  ts real not null,
  --- other data items ---
);
create index EventSource1 on Events (Source1id, Source2id);
create index EventSource2 on Events (Source2id, Source1id);
create index EventTS on Events (ts);

So now if you want all the combinations of source1 and source2 it is simply:

  select source1.name, source2.name
    from source1, source2
order by 1, 2;

If you only want them where there are events then:

  select source1.name, source2.name
    from source1, source2
   where exists (select *
                   from events
                  where events.source1id == source1.id
                    and events.source2id == source2.id)
order by 1, 2;

If you want all the stuff between two ts you can do:

select source1.name, source2.name, events.*
  from source1, source2, events
 where source1.id == events.source1id
   and source2.id == events.source2id
   and events.id between coalesce((select max(id)
                                     from events
                                    where ts <= :StartTime),
                                  (select min(id)
                                     from events))
                     and coalesce((select min(id)
                                     from events
                                    where ts >= :StartTime),
                                  (select max(id)
                                     from events))
order by events.id;

If you only want the data for source2 = 'GATE' you can do the same and add:

   and source2.name == 'GATE'

or probably more efficient:

   and events.source2id == (select id from source2 where name == 'GATE')

and even other combinations:

   and events.source1id in (select id from source1 where name in ('ABA', 'AXX'))
   and events.source2id in (select id from source2 where name == 'GATE')

and so on and so forth, and let the query optimizer decide the best way to actually retrieve the data you requested.


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

D Burgess
On Wed, Feb 6, 2019 at 11:26 AM Keith Medcalf <[hidden email]> wrote:
"you have not normalized the data before storing it"

This is true of most of the hundreds, if not thousands, of schema that I
have seen.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

R Smith-2
In reply to this post by Gerlando Falauto


On 2019/02/06 12:12 AM, Gerlando Falauto wrote:
>>
>> The use case involves retaining as much data as the storage can possibly
>> hold (so a bunch of gigabytes).
>> I could've just used directories and logfiles instead of abusing a
>> relational database but I just thought it would more convenient to issue a
>> query and use a cursor.

There is nothing wrong with using a database to store a list of values,
as another poster pointed out, [insert high made-up number]% of schemata
out there are basically just that, but I like Keith's suggestion, since
you did decide to DB it, why not make it nicely relational too?

>> the table is often more efficient than threading a lookup via another
>> index into the query plan. Sometimes crafting a new temp BTree Index for
>> (a) specific field(s) on a materialized set of data might also be judged
>> faster than re-establishing links between said data and its original Index.
>>
> Do you think restoring the original primary key (instead of ROWID) and
> dropping the index would make any difference?

I do think it would make a difference (almost any change would), but I
am not sure it would make all the difference. I would however suggest,
at the very least, to test this and see.


>
>> I pre-populated the table with a realistic use case scenario and ran
>> ANALYZE.
>> I'm not planning on using ANALYZE on the real system -- though I might
>> indeed pre-populate sqlite_stat1 with typical values as suggested in the
>> docs.

This is fine. I would ask - did your "test" data include a gigabyte or
more data? The amount, cardinality and shape of the data are all most
important for ANALYZE to provide good information.

>> If you can demonstrate a true degradation //...
> Yes, in the worst case, adding the ORDER BY clause (2 vs.1, 4 vs.3) leads
> to a perceivable degradation in terms of both seek time (several seconds
> vs. milliseconds to get the first row) and occupied disk space.

I must have missed this, apologies, that is certainly a very true
degradation. Note that the entire query delivery should be taken into
consideration. The first row can often be delivered near instantaneous
with following rows taking progressively longer. Very often the time it
takes to deliver the first row is compensated by the time that is saved
later along the subsequent rows. The QP takes this into consideration.

A good example is getting a set of data, say 100 rows, sorting it first
and then just spitting it out from memory. The preparation (aka first
row delivery) will take time, all the rest will be instant. Contrast
that with a query that needs no sorting, it might produce rows as it
scans the table, the first of which might appear instantly (since it's
at the top of the table and satisfies the WHERE clause), but all the
next qualifying rows might take a long while to produce depending on
where they fall within the table. In the end the fully traversed cursor
may take similar amounts of time.

The QP cannot know before-hand how many "hits" it would encounter, so
has to use a basic pre-made guide and/or help from the ANALYZE data to
best guess which route is better - and you can easily construct a
non-usual set of data for which it will choose wrong every time, and for
which "fixing" it will negatively affect more common sets of data.


> As I already said, my use case *is* quite unusual. Definitely not something
> you'd normally use a relational database for.

That does not matter - if the query planner can do better, it should -
unless of course changing the decision tree will negatively affect
another more widely used query case. (This is the hard part to establish.)


_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Rowan Worth-2
In reply to this post by Simon Slavin-3
On Tue, 5 Feb 2019 at 22:46, Simon Slavin <[hidden email]> wrote:

> On 5 Feb 2019, at 8:59am, Rowan Worth <[hidden email]> wrote:
>
> > SELECT source1, source2, ts, value
> > FROM rolling
> > WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 100000000
> > ORDER BY source1, source2, ts;
> >
> > And this index:
> >
> > CREATE INDEX `sources` ON `rolling` (
> >    `source1`,
> >    `source2`,
> >    `ts`
> > );
> >
> > What is stopping sqlite's query planner from taking advantage of the
> index, which it has chosen to use for the query, to also satisfy the ORDER
> BY?
>
> I suspect that, given the data in the table, the index supplied is not
> optimal for selecting the correct rows from the table.  SQLite may have
> decided that it needs to select on the contents of ts first, then source1.
>

This seems like a reasonable hypothesis, and explains one of Gerlando's
observations (sqlite _did_ decide to use an index on `ts` in a different
version of the DB). However, the EXPLAIN QUERY PLAN output demonstrates
that it _is_ using the `sources` index when that's the only one available:

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)
`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

(sorry for omitting this detail; this all comes from the start of the
thread in Gerlando's 3rd email)

What makes the TEMP B-TREE necessary, when sqlite is already using the
perfect index to satisfy the ORDER BY?

-Rowan
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12