Min/Max and skip-scan optimizations

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Min/Max and skip-scan optimizations

Gerlando Falauto
Hi,

I have a database table looking like this:

CREATE TABLE `rolling` (
    `source1`    TEXT NOT NULL,
    `source2`    TEXT NOT NULL,
    `ts`    INTEGER NOT NULL,
    `value`    TEXT
);

CREATE INDEX `sources` ON `rolling` (
    `source1`,
    `source2`,
    `ts`
);

INSERT INTO rolling
    WITH RECURSIVE
      src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
      src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL
VALUES("X3") UNION ALL VALUES("X4") ),
      cnt( ts, value) AS (
      VALUES( 0, "ZZZZ")
        UNION ALL
      SELECT ts+1, value FROM cnt LIMIT 1000000)

    select src1.source1, src2.source2, cnt.* from src1, src2, cnt;


So the table looks like the following

aaa|X1|0|ZZZZ
aaa|X1|1|ZZZZ
....
aaa|X1|999998|ZZZZ
aaa|X1|999999|ZZZZ
aaa|X2|0|ZZZZ
aaa|X2|1|ZZZZ
...
aaa|X2|999998|ZZZZ
aaa|X2|999999|ZZZZ
...
aaa|X4|999999|ZZZZ
bbb|X1|0|ZZZZ
bbb|X1|1|ZZZZ
bbb|X1|2|ZZZZ
...
bbb|X4|999999|ZZZZ

So we have 2*4*1M = 8M rows, all indexed. The first two text columns
(source1, source2) have a very limited set of possible values, while the
third numeric column is a timestamp.
So it's essentially a partitioned logbook, with a very small number of
partitions.

Given this particular structure, I would expect indexing to be extremely
helpful, thanks to covering indexes, min/max and skip-scan optimizations.

Below as some example queries that I would expect to be taking advantage of
indexing.
They're however sometimes not, so I'd like some help understanding if I'm
getting something wrong.

1)  SELECT DISTINCT source1, source2 FROM rolling;

Provided the table has been analyzed, this will run real fast.
Here I'm essentially try to get a recap of all distinct values for the
first two columns of an index.
So ideally I would just scan the covering index, skipping to the next
values for the first two columns.
I don't really understand why analyze is a precondition for this
optimization to kick in, however I'm pretty happy about this.

SUGGESTION: Perhaps this could also be added as an example in the
description of the skip-scan optimization?

2) SELECT * FROM rolling WHERE ts < 100;

Provided the table has been analyzed, this will run real fast as well.
I believe the skip-scan optimization is responsible for making this
instantaneous, since this looks like one of the canonical examples.

3) SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")

Again, this will be almost instantaneous, and works regardless of whether
the table has been analyzed or not.

So it looks like the min/max optimization:

 >> Queries that contain a single MIN() or MAX() aggregate function whose
argument is the left-most column of an index
 >> might be satisfied by doing a single index lookup rather than by
scanning the entire table.

Here the argument of the MIN() function is technically *not* the left-most
column of an index, though imposing a certain
value for the first two columns kind-of makes it the left-most one.

SUGGESTION: Perhaps this could also be added as an example in the
description of the min-max optimization?

QUESTION: this feature -- locking tha value for left-most columns of an
index, I believe it's WHERE CLAUSE analysis --
looks like a general feature does it apply to other cases as well?

4) SELECT MIN(ts), MAX(ts) FROM rolling WHERE (source1,source2)=("aaa",
"X2")

This will *NOT* be optimized. I understand by reading from previous
messages in this group that this is a
known limitation of the Min/Max optimization. There are however known
workarounds for it, like:

SELECT (SELECT MIN(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2")),
(SELECT MAX(ts) FROM rolling WHERE (source1,source2)=("aaa", "X2"));

SUGGESTION: Perhaps this could be added in the documentation?

5) SELECT source1,source2,MIN(ts) FROM rolling GROUP BY source1, source2

Here I would expect the query to be optimized somehow by taking advantage
of both skip-scan and min/max optimizations,
but I could not manage to find a way.

This could be easily done in the application code by just nesting two
separate queries:

SELECT DISTINCT source1, source2 FROM rolling; -- This will return the set
of sources available and will be quite fast

SELECT MIN(ts) FROM rolling WHERE (source1,source2)=(?, ?) -- This will
manually filter by the results returned by the previous one

I tried everything that came to mind to run this as a single query,
including ORDER BY, INDEXED BY, subqueries, joins, window functions,
but everything just seemed to make things worse.

QUESTION: Is there any way to force the query engine to take advantage of
both optimizations?

Thanks in advance!
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Keith Medcalf

Do you perhaps want this:

select source1,
       source2,
       (
        select min(ts)
          from rolling
         where source1 = x.source1
           and source2 = x.source2
       )
  from (
        select distinct source1,
                        source2
          from rolling
       ) as x;

SQLite version 3.27.0 2019-01-28 00:42:06
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .timer on
sqlite> CREATE TABLE `rolling` (
   ...>     `source1`    TEXT NOT NULL,
   ...>     `source2`    TEXT NOT NULL,
   ...>     `ts`    INTEGER NOT NULL,
   ...>     `value`    TEXT
   ...> );
Run Time: real 0.002 user 0.000000 sys 0.000000
sqlite>
sqlite> CREATE INDEX `sources` ON `rolling` (
   ...>     `source1`,
   ...>     `source2`,
   ...>     `ts`
   ...> );
Run Time: real 0.001 user 0.000000 sys 0.000000
sqlite>
sqlite> INSERT INTO rolling
   ...>     WITH RECURSIVE
   ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
   ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL
   ...> VALUES("X3") UNION ALL VALUES("X4") ),
   ...>       cnt( ts, value) AS (
   ...>       VALUES( 0, "ZZZZ")
   ...>         UNION ALL
   ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
   ...>
   ...>     select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
Run Time: real 8.920 user 8.843750 sys 0.078125
sqlite>
sqlite> analyze;
Run Time: real 1.285 user 1.281250 sys 0.000000
sqlite> .eqp full
sqlite>
sqlite> select source1,
   ...>        source2,
   ...>        (
   ...>         select min(ts)
   ...>           from rolling
   ...>          where source1 = x.source1
   ...>            and source2 = x.source2
   ...>        )
   ...>   from (
   ...>         select distinct source1,
   ...>                         source2
   ...>           from rolling
   ...>        ) as x;
QUERY PLAN
|--CO-ROUTINE 2
|  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
|--SCAN SUBQUERY 2 AS x (~7864320 rows)
`--CORRELATED SCALAR SUBQUERY 1
   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     64    0                    00  Start at 64
1     InitCoroutine  1     23    2                    00  x
2     Null           1     4     0                    08  r[4]=NULL
3     OpenRead       4     3     0     k(4,,,,)       00  root=3 iDb=0; sources
4     ColumnsUsed    4     0     0     3              00
5     Explain        5     0     0     SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)  00
6     Noop           0     0     0                    00  Begin WHERE-loop0: rolling
7     Rewind         4     21    2     0              00
8         Noop           0     0     0                    00  Begin WHERE-core
9         Column         4     0     2                    00  r[2]=rolling.source1
10        Column         4     1     3                    00  r[3]=rolling.source2
11        Ne             2     13    4     (BINARY)       80  if r[4]!=r[2] goto 13
12        Eq             3     20    5     (BINARY)       80  if r[5]==r[3] goto 20
13        Copy           2     4     1                    00  r[4..5]=r[2..3]
14        Yield          1     0     0                    00
15        Noop           0     0     0                    00  End WHERE-core
16        Column         4     0     6                    00  r[6]=
17        Column         4     1     7                    00  r[7]=
18        SeekGT         4     21    6     2              00  key=r[6..7]
19      Goto           1     8     0                    00
20    Next           4     8     0                    01
21    Noop           0     0     0                    00  End WHERE-loop0: rolling
22    EndCoroutine   1     0     0                    00
23    Explain        23    0     0     SCAN SUBQUERY 2 AS x (~7864320 rows)  00
24    Noop           0     0     0                    00  Begin WHERE-loop0: x
25    InitCoroutine  1     0     2                    00
26      Yield          1     62    0                    00  next row of x
27      Noop           0     0     0                    00  Begin WHERE-core
28      Copy           2     9     0                    00  r[9]=r[2]; x.source1
29      Copy           3     10    0                    00  r[10]=r[3]; x.source2
30      Null           0     12    12                   00  r[12..12]=NULL; Init subquery result
31      Integer        1     13    0                    00  r[13]=1; LIMIT counter
32      Null           0     14    15                   00  r[14..15]=NULL
33      OpenRead       5     3     0     k(4,,,,)       00  root=3 iDb=0; sources
34      ColumnsUsed    5     0     0     7              00
35      Explain        35    0     0     SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)  00
36      Noop           0     0     0                    00  Begin WHERE-loop0: rolling
37      Copy           2     16    0                    00  r[16]=r[2]; x.source1
38      Copy           3     17    0                    00  r[17]=r[3]; x.source2
39      CursorHint     5     0     0     AND(EQ(c0,r[16]),EQ(c1,r[17]))  00
40      Copy           2     18    0                    00  r[18]=r[2]; x.source1
41      IsNull         18    54    0                    00  if r[18]==NULL goto 54
42      Copy           3     19    0                    00  r[19]=r[3]; x.source2
43      IsNull         19    54    0                    00  if r[19]==NULL goto 54
44      Null           0     20    0                    00  r[20]=NULL
45      SeekGT         5     54    18    3              00  key=r[18..20]
46        IdxGT          5     54    18    2              00  key=r[18..19]
47        Noop           0     0     0                    00  Begin WHERE-core
48        Column         5     2     21                   00  r[21]=rolling.ts
49        CollSeq        0     0     0     (BINARY)       00
50        AggStep        0     21    14    min(1)         01  accum=r[14] step(r[21])
51        Goto           0     55    0                    00  min() by index
52        Noop           0     0     0                    00  End WHERE-core
53      Next           5     46    0                    00
54      Noop           0     0     0                    00  End WHERE-loop0: rolling
55      AggFinal       14    1     0     min(1)         00  accum=r[14] N=1
56      Copy           14    12    0                    00  r[12]=r[14]
57      DecrJumpZero   13    58    0                    00  if (--r[13])==0 goto 58
58      Copy           12    11    0                    00  r[11]=r[12]
59      ResultRow      9     3     0                    00  output=r[9..11]
60      Noop           0     0     0                    00  End WHERE-core
61    Goto           0     26    0                    00
62    Noop           0     0     0                    00  End WHERE-loop0: x
63    Halt           0     0     0                    00
64    Transaction    0     0     3     0              01  usesStmtJournal=0
65    Goto           0     1     0                    00
aaa|X1|0
aaa|X2|0
aaa|X3|0
aaa|X4|0
bbb|X1|0
bbb|X2|0
bbb|X3|0
bbb|X4|0
Run Time: real 0.134 user 0.000000 sys 0.000000
sqlite>

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
YES! Thank you!
Many thanks for the ".eqp full" tip also, that really explains a lot
(though I don't really understand any of it yet).

Have a great day!
Gerlando


On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[hidden email]> wrote:

>
> Do you perhaps want this:
>
> select source1,
>        source2,
>        (
>         select min(ts)
>           from rolling
>          where source1 = x.source1
>            and source2 = x.source2
>        )
>   from (
>         select distinct source1,
>                         source2
>           from rolling
>        ) as x;
>
> SQLite version 3.27.0 2019-01-28 00:42:06
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> .timer on
> sqlite> CREATE TABLE `rolling` (
>    ...>     `source1`    TEXT NOT NULL,
>    ...>     `source2`    TEXT NOT NULL,
>    ...>     `ts`    INTEGER NOT NULL,
>    ...>     `value`    TEXT
>    ...> );
> Run Time: real 0.002 user 0.000000 sys 0.000000
> sqlite>
> sqlite> CREATE INDEX `sources` ON `rolling` (
>    ...>     `source1`,
>    ...>     `source2`,
>    ...>     `ts`
>    ...> );
> Run Time: real 0.001 user 0.000000 sys 0.000000
> sqlite>
> sqlite> INSERT INTO rolling
>    ...>     WITH RECURSIVE
>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2")
> UNION ALL
>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
>    ...>       cnt( ts, value) AS (
>    ...>       VALUES( 0, "ZZZZ")
>    ...>         UNION ALL
>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
>    ...>
>    ...>     select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
> Run Time: real 8.920 user 8.843750 sys 0.078125
> sqlite>
> sqlite> analyze;
> Run Time: real 1.285 user 1.281250 sys 0.000000
> sqlite> .eqp full
> sqlite>
> sqlite> select source1,
>    ...>        source2,
>    ...>        (
>    ...>         select min(ts)
>    ...>           from rolling
>    ...>          where source1 = x.source1
>    ...>            and source2 = x.source2
>    ...>        )
>    ...>   from (
>    ...>         select distinct source1,
>    ...>                         source2
>    ...>           from rolling
>    ...>        ) as x;
> QUERY PLAN
> |--CO-ROUTINE 2
> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
> `--CORRELATED SCALAR SUBQUERY 1
>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND
> source2=?) (~983040 rows)
> addr  opcode         p1    p2    p3    p4             p5  comment
> ----  -------------  ----  ----  ----  -------------  --  -------------
> 0     Init           0     64    0                    00  Start at 64
> 1     InitCoroutine  1     23    2                    00  x
> 2     Null           1     4     0                    08  r[4]=NULL
> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3 iDb=0;
> sources
> 4     ColumnsUsed    4     0     0     3              00
> 5     Explain        5     0     0     SCAN TABLE rolling USING COVERING
> INDEX sources (~7864320 rows)  00
> 6     Noop           0     0     0                    00  Begin
> WHERE-loop0: rolling
> 7     Rewind         4     21    2     0              00
> 8         Noop           0     0     0                    00  Begin
> WHERE-core
> 9         Column         4     0     2                    00
> r[2]=rolling.source1
> 10        Column         4     1     3                    00
> r[3]=rolling.source2
> 11        Ne             2     13    4     (BINARY)       80  if
> r[4]!=r[2] goto 13
> 12        Eq             3     20    5     (BINARY)       80  if
> r[5]==r[3] goto 20
> 13        Copy           2     4     1                    00
> r[4..5]=r[2..3]
> 14        Yield          1     0     0                    00
> 15        Noop           0     0     0                    00  End
> WHERE-core
> 16        Column         4     0     6                    00  r[6]=
> 17        Column         4     1     7                    00  r[7]=
> 18        SeekGT         4     21    6     2              00  key=r[6..7]
> 19      Goto           1     8     0                    00
> 20    Next           4     8     0                    01
> 21    Noop           0     0     0                    00  End WHERE-loop0:
> rolling
> 22    EndCoroutine   1     0     0                    00
> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x (~7864320
> rows)  00
> 24    Noop           0     0     0                    00  Begin
> WHERE-loop0: x
> 25    InitCoroutine  1     0     2                    00
> 26      Yield          1     62    0                    00  next row of x
> 27      Noop           0     0     0                    00  Begin
> WHERE-core
> 28      Copy           2     9     0                    00  r[9]=r[2];
> x.source1
> 29      Copy           3     10    0                    00  r[10]=r[3];
> x.source2
> 30      Null           0     12    12                   00
> r[12..12]=NULL; Init subquery result
> 31      Integer        1     13    0                    00  r[13]=1; LIMIT
> counter
> 32      Null           0     14    15                   00  r[14..15]=NULL
> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3 iDb=0;
> sources
> 34      ColumnsUsed    5     0     0     7              00
> 35      Explain        35    0     0     SEARCH TABLE rolling USING
> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)  00
> 36      Noop           0     0     0                    00  Begin
> WHERE-loop0: rolling
> 37      Copy           2     16    0                    00  r[16]=r[2];
> x.source1
> 38      Copy           3     17    0                    00  r[17]=r[3];
> x.source2
> 39      CursorHint     5     0     0     AND(EQ(c0,r[16]),EQ(c1,r[17]))  00
> 40      Copy           2     18    0                    00  r[18]=r[2];
> x.source1
> 41      IsNull         18    54    0                    00  if r[18]==NULL
> goto 54
> 42      Copy           3     19    0                    00  r[19]=r[3];
> x.source2
> 43      IsNull         19    54    0                    00  if r[19]==NULL
> goto 54
> 44      Null           0     20    0                    00  r[20]=NULL
> 45      SeekGT         5     54    18    3              00  key=r[18..20]
> 46        IdxGT          5     54    18    2              00  key=r[18..19]
> 47        Noop           0     0     0                    00  Begin
> WHERE-core
> 48        Column         5     2     21                   00
> r[21]=rolling.ts
> 49        CollSeq        0     0     0     (BINARY)       00
> 50        AggStep        0     21    14    min(1)         01  accum=r[14]
> step(r[21])
> 51        Goto           0     55    0                    00  min() by
> index
> 52        Noop           0     0     0                    00  End
> WHERE-core
> 53      Next           5     46    0                    00
> 54      Noop           0     0     0                    00  End
> WHERE-loop0: rolling
> 55      AggFinal       14    1     0     min(1)         00  accum=r[14] N=1
> 56      Copy           14    12    0                    00  r[12]=r[14]
> 57      DecrJumpZero   13    58    0                    00  if
> (--r[13])==0 goto 58
> 58      Copy           12    11    0                    00  r[11]=r[12]
> 59      ResultRow      9     3     0                    00  output=r[9..11]
> 60      Noop           0     0     0                    00  End WHERE-core
> 61    Goto           0     26    0                    00
> 62    Noop           0     0     0                    00  End WHERE-loop0:
> x
> 63    Halt           0     0     0                    00
> 64    Transaction    0     0     3     0              01  usesStmtJournal=0
> 65    Goto           0     1     0                    00
> aaa|X1|0
> aaa|X2|0
> aaa|X3|0
> aaa|X4|0
> bbb|X1|0
> bbb|X2|0
> bbb|X3|0
> bbb|X4|0
> Run Time: real 0.134 user 0.000000 sys 0.000000
> sqlite>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
Hi,
it's me again, struggling with indices once again.
What I'm now trying to do is filter on source1 and by range of timestamp.
Results should be naturally ordered by source1, source2,ts.

1)

SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
AND ts > 1 AND ts < 10;

QUERY PLAN
`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)

This looks pretty OK on the sample dataset.
For some reason though this doesn't really work like that with the real
dataset (which has an extra index on ts only),
and that seems to kick in instead of the real index.
So the query plan comes out completely different, and rows are not
naturally sorted by source1, source2, as I'd like.
[I know I can use "+ts" and/or drop the "ts" index altogether, but I'm
trying to make a point here...]
So I add an extra ORDER BY clause:

2)

SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
  AND ts > 1 AND ts < 100000000
ORDER BY source1, source2, ts;

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)
`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY

Here I don't really understand why a TEMP B-TREE is required at all.
Apparently, this adds extra processing at the end so I don't start pulling
any rows until much later.
If the index is used, data would be *naturally* sorted by its key, so I
don't really understand.
So I recur to a subquery:

3)

SELECT source1, source2, ts, value
FROM rolling
WHERE ((source1, source2) in
        (SELECT DISTINCT source1, source2 from rolling where source1='aaa')
  AND ts > 1 AND ts < 10)

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? AND
ts>? AND ts<?)
`--LIST SUBQUERY
   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)

This also seems to yield the right results, in the right order.
But the order looks pretty much like a coincidence to me.
So I'd rather explicitly state the sorting I'd like:

4)

SELECT source1, source2, ts, value
FROM rolling
WHERE ((source1, source2) in
        (SELECT DISTINCT source1, source2 from rolling where source1='aaa')
  AND ts > 1 AND ts < 10)
ORDER BY source1, source2, ts

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? AND
ts>? AND ts<?)
|--LIST SUBQUERY
|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
`--USE TEMP B-TREE FOR ORDER BY

Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like some
global sorting
which looks even worse than case 2.

What am I doing wrong? Is this expected?
I just don't seem to be able to get what would have been a pretty trivial
task with
last-century technologies (thinking of Paradox, dBase, CA-Clipper)...

Thanks in advance!
Gerlando


On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto <[hidden email]>
wrote:

> YES! Thank you!
> Many thanks for the ".eqp full" tip also, that really explains a lot
> (though I don't really understand any of it yet).
>
> Have a great day!
> Gerlando
>
>
> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[hidden email]> wrote:
>
>>
>> Do you perhaps want this:
>>
>> select source1,
>>        source2,
>>        (
>>         select min(ts)
>>           from rolling
>>          where source1 = x.source1
>>            and source2 = x.source2
>>        )
>>   from (
>>         select distinct source1,
>>                         source2
>>           from rolling
>>        ) as x;
>>
>> SQLite version 3.27.0 2019-01-28 00:42:06
>> Enter ".help" for usage hints.
>> Connected to a transient in-memory database.
>> Use ".open FILENAME" to reopen on a persistent database.
>> sqlite> .timer on
>> sqlite> CREATE TABLE `rolling` (
>>    ...>     `source1`    TEXT NOT NULL,
>>    ...>     `source2`    TEXT NOT NULL,
>>    ...>     `ts`    INTEGER NOT NULL,
>>    ...>     `value`    TEXT
>>    ...> );
>> Run Time: real 0.002 user 0.000000 sys 0.000000
>> sqlite>
>> sqlite> CREATE INDEX `sources` ON `rolling` (
>>    ...>     `source1`,
>>    ...>     `source2`,
>>    ...>     `ts`
>>    ...> );
>> Run Time: real 0.001 user 0.000000 sys 0.000000
>> sqlite>
>> sqlite> INSERT INTO rolling
>>    ...>     WITH RECURSIVE
>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb")
>> ),
>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2")
>> UNION ALL
>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
>>    ...>       cnt( ts, value) AS (
>>    ...>       VALUES( 0, "ZZZZ")
>>    ...>         UNION ALL
>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
>>    ...>
>>    ...>     select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
>> Run Time: real 8.920 user 8.843750 sys 0.078125
>> sqlite>
>> sqlite> analyze;
>> Run Time: real 1.285 user 1.281250 sys 0.000000
>> sqlite> .eqp full
>> sqlite>
>> sqlite> select source1,
>>    ...>        source2,
>>    ...>        (
>>    ...>         select min(ts)
>>    ...>           from rolling
>>    ...>          where source1 = x.source1
>>    ...>            and source2 = x.source2
>>    ...>        )
>>    ...>   from (
>>    ...>         select distinct source1,
>>    ...>                         source2
>>    ...>           from rolling
>>    ...>        ) as x;
>> QUERY PLAN
>> |--CO-ROUTINE 2
>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
>> `--CORRELATED SCALAR SUBQUERY 1
>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND
>> source2=?) (~983040 rows)
>> addr  opcode         p1    p2    p3    p4             p5  comment
>> ----  -------------  ----  ----  ----  -------------  --  -------------
>> 0     Init           0     64    0                    00  Start at 64
>> 1     InitCoroutine  1     23    2                    00  x
>> 2     Null           1     4     0                    08  r[4]=NULL
>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3 iDb=0;
>> sources
>> 4     ColumnsUsed    4     0     0     3              00
>> 5     Explain        5     0     0     SCAN TABLE rolling USING COVERING
>> INDEX sources (~7864320 rows)  00
>> 6     Noop           0     0     0                    00  Begin
>> WHERE-loop0: rolling
>> 7     Rewind         4     21    2     0              00
>> 8         Noop           0     0     0                    00  Begin
>> WHERE-core
>> 9         Column         4     0     2                    00
>> r[2]=rolling.source1
>> 10        Column         4     1     3                    00
>> r[3]=rolling.source2
>> 11        Ne             2     13    4     (BINARY)       80  if
>> r[4]!=r[2] goto 13
>> 12        Eq             3     20    5     (BINARY)       80  if
>> r[5]==r[3] goto 20
>> 13        Copy           2     4     1                    00
>> r[4..5]=r[2..3]
>> 14        Yield          1     0     0                    00
>> 15        Noop           0     0     0                    00  End
>> WHERE-core
>> 16        Column         4     0     6                    00  r[6]=
>> 17        Column         4     1     7                    00  r[7]=
>> 18        SeekGT         4     21    6     2              00  key=r[6..7]
>> 19      Goto           1     8     0                    00
>> 20    Next           4     8     0                    01
>> 21    Noop           0     0     0                    00  End
>> WHERE-loop0: rolling
>> 22    EndCoroutine   1     0     0                    00
>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x (~7864320
>> rows)  00
>> 24    Noop           0     0     0                    00  Begin
>> WHERE-loop0: x
>> 25    InitCoroutine  1     0     2                    00
>> 26      Yield          1     62    0                    00  next row of x
>> 27      Noop           0     0     0                    00  Begin
>> WHERE-core
>> 28      Copy           2     9     0                    00  r[9]=r[2];
>> x.source1
>> 29      Copy           3     10    0                    00  r[10]=r[3];
>> x.source2
>> 30      Null           0     12    12                   00
>> r[12..12]=NULL; Init subquery result
>> 31      Integer        1     13    0                    00  r[13]=1;
>> LIMIT counter
>> 32      Null           0     14    15                   00  r[14..15]=NULL
>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3 iDb=0;
>> sources
>> 34      ColumnsUsed    5     0     0     7              00
>> 35      Explain        35    0     0     SEARCH TABLE rolling USING
>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)  00
>> 36      Noop           0     0     0                    00  Begin
>> WHERE-loop0: rolling
>> 37      Copy           2     16    0                    00  r[16]=r[2];
>> x.source1
>> 38      Copy           3     17    0                    00  r[17]=r[3];
>> x.source2
>> 39      CursorHint     5     0     0     AND(EQ(c0,r[16]),EQ(c1,r[17]))
>> 00
>> 40      Copy           2     18    0                    00  r[18]=r[2];
>> x.source1
>> 41      IsNull         18    54    0                    00  if
>> r[18]==NULL goto 54
>> 42      Copy           3     19    0                    00  r[19]=r[3];
>> x.source2
>> 43      IsNull         19    54    0                    00  if
>> r[19]==NULL goto 54
>> 44      Null           0     20    0                    00  r[20]=NULL
>> 45      SeekGT         5     54    18    3              00  key=r[18..20]
>> 46        IdxGT          5     54    18    2              00
>> key=r[18..19]
>> 47        Noop           0     0     0                    00  Begin
>> WHERE-core
>> 48        Column         5     2     21                   00
>> r[21]=rolling.ts
>> 49        CollSeq        0     0     0     (BINARY)       00
>> 50        AggStep        0     21    14    min(1)         01  accum=r[14]
>> step(r[21])
>> 51        Goto           0     55    0                    00  min() by
>> index
>> 52        Noop           0     0     0                    00  End
>> WHERE-core
>> 53      Next           5     46    0                    00
>> 54      Noop           0     0     0                    00  End
>> WHERE-loop0: rolling
>> 55      AggFinal       14    1     0     min(1)         00  accum=r[14]
>> N=1
>> 56      Copy           14    12    0                    00  r[12]=r[14]
>> 57      DecrJumpZero   13    58    0                    00  if
>> (--r[13])==0 goto 58
>> 58      Copy           12    11    0                    00  r[11]=r[12]
>> 59      ResultRow      9     3     0                    00
>> output=r[9..11]
>> 60      Noop           0     0     0                    00  End WHERE-core
>> 61    Goto           0     26    0                    00
>> 62    Noop           0     0     0                    00  End
>> WHERE-loop0: x
>> 63    Halt           0     0     0                    00
>> 64    Transaction    0     0     3     0              01
>> usesStmtJournal=0
>> 65    Goto           0     1     0                    00
>> aaa|X1|0
>> aaa|X2|0
>> aaa|X3|0
>> aaa|X4|0
>> bbb|X1|0
>> bbb|X2|0
>> bbb|X3|0
>> bbb|X4|0
>> Run Time: real 0.134 user 0.000000 sys 0.000000
>> sqlite>
>>
>> ---
>> The fact that there's a Highway to Hell but only a Stairway to Heaven
>> says a lot about anticipated traffic volume.
>>
>>
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [hidden email]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
On 2 Feb 2019, at 10:19pm, Gerlando Falauto <[hidden email]> wrote:

> SELECT source1, source2, ts, value
> FROM rolling
> WHERE source1 = 'aaa'
> AND ts > 1 AND ts < 10;
>
> [...snip...]

> So I add an extra ORDER BY clause:

Concentrate on the results you want.  Don't try to 'game' the library.  You don't understand how it thinks, and nor do I.  Do you want the things in a certain order ?  If not, don't ask for it.

To find the best results for the above query,

1) Put a typical amount of typical data into the table
2) Create two indexes:
    first index (source1, ts)
    second index (ts, source1)
3) Execute the command ANALYZE.
4) Execute the query you actually want (presumably without ORDER BY) using EXPLAIN QUERY PLAN.

Find what it does and get back to us.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Tim Streater-3
In reply to this post by Gerlando Falauto
On 02 Feb 2019, at 22:19, Gerlando Falauto <[hidden email]> wrote:

> What I'm now trying to do is filter on source1 and by range of timestamp.
> Results should be naturally ordered by source1, source2,ts.

Not unless you use an ORDER BY.


--
Cheers  --  Tim
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
In reply to this post by Gerlando Falauto
On 2 Feb 2019, at 10:19pm, Gerlando Falauto <[hidden email]> wrote:

> Results should be naturally ordered by source1, source2,ts.

[Sorry, I missed this the first time.  Thanks, Tim.]

Sorry, no.  You're making assumptions about how SQLite works internally.  If you want your results sorted, ask for them sorted.  If you don't, don't.

Note that if you don't ask for them sorted and they come out sorted by good luck, there's no guarantee that they'll be sorted if you put different data in the table, or the same data with the rows in a different order, or if you update to a later version of SQLite.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Keith Medcalf
In reply to this post by Gerlando Falauto

Like this?

SELECT rolling.source1,
       rolling.source2,
       ts,
       value
  FROM (
        select distinct source1,
                        source2
          from rolling
         where source1 = 'aaa'
       ) as x
  JOIN rolling
    ON rolling.source1 = x.source1
   AND rolling.source2 = x.source2
 WHERE ts > 1
   AND ts < 10
ORDER BY 1,2,3;


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[hidden email]] On Behalf Of Gerlando Falauto
>Sent: Saturday, 2 February, 2019 15:20
>To: SQLite mailing list
>Subject: Re: [sqlite] Min/Max and skip-scan optimizations
>
>Hi,
>it's me again, struggling with indices once again.
>What I'm now trying to do is filter on source1 and by range of
>timestamp.
>Results should be naturally ordered by source1, source2,ts.
>
>1)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE source1 = 'aaa'
>AND ts > 1 AND ts < 10;
>
>QUERY PLAN
>`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
>ANY(source2)
>AND ts>? AND ts<?)
>
>This looks pretty OK on the sample dataset.
>For some reason though this doesn't really work like that with the
>real
>dataset (which has an extra index on ts only),
>and that seems to kick in instead of the real index.
>So the query plan comes out completely different, and rows are not
>naturally sorted by source1, source2, as I'd like.
>[I know I can use "+ts" and/or drop the "ts" index altogether, but
>I'm
>trying to make a point here...]
>So I add an extra ORDER BY clause:
>
>2)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE source1 = 'aaa'
>  AND ts > 1 AND ts < 100000000
>ORDER BY source1, source2, ts;
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
>ANY(source2)
>AND ts>? AND ts<?)
>`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
>
>Here I don't really understand why a TEMP B-TREE is required at all.
>Apparently, this adds extra processing at the end so I don't start
>pulling
>any rows until much later.
>If the index is used, data would be *naturally* sorted by its key, so
>I
>don't really understand.
>So I recur to a subquery:
>
>3)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE ((source1, source2) in
>        (SELECT DISTINCT source1, source2 from rolling where
>source1='aaa')
>  AND ts > 1 AND ts < 10)
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
>AND
>ts>? AND ts<?)
>`--LIST SUBQUERY
>   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
>
>This also seems to yield the right results, in the right order.
>But the order looks pretty much like a coincidence to me.
>So I'd rather explicitly state the sorting I'd like:
>
>4)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE ((source1, source2) in
>        (SELECT DISTINCT source1, source2 from rolling where
>source1='aaa')
>  AND ts > 1 AND ts < 10)
>ORDER BY source1, source2, ts
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
>AND
>ts>? AND ts<?)
>|--LIST SUBQUERY
>|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
>`--USE TEMP B-TREE FOR ORDER BY
>
>Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
>some
>global sorting
>which looks even worse than case 2.
>
>What am I doing wrong? Is this expected?
>I just don't seem to be able to get what would have been a pretty
>trivial
>task with
>last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
>
>Thanks in advance!
>Gerlando
>
>
>On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
><[hidden email]>
>wrote:
>
>> YES! Thank you!
>> Many thanks for the ".eqp full" tip also, that really explains a
>lot
>> (though I don't really understand any of it yet).
>>
>> Have a great day!
>> Gerlando
>>
>>
>> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[hidden email]>
>wrote:
>>
>>>
>>> Do you perhaps want this:
>>>
>>> select source1,
>>>        source2,
>>>        (
>>>         select min(ts)
>>>           from rolling
>>>          where source1 = x.source1
>>>            and source2 = x.source2
>>>        )
>>>   from (
>>>         select distinct source1,
>>>                         source2
>>>           from rolling
>>>        ) as x;
>>>
>>> SQLite version 3.27.0 2019-01-28 00:42:06
>>> Enter ".help" for usage hints.
>>> Connected to a transient in-memory database.
>>> Use ".open FILENAME" to reopen on a persistent database.
>>> sqlite> .timer on
>>> sqlite> CREATE TABLE `rolling` (
>>>    ...>     `source1`    TEXT NOT NULL,
>>>    ...>     `source2`    TEXT NOT NULL,
>>>    ...>     `ts`    INTEGER NOT NULL,
>>>    ...>     `value`    TEXT
>>>    ...> );
>>> Run Time: real 0.002 user 0.000000 sys 0.000000
>>> sqlite>
>>> sqlite> CREATE INDEX `sources` ON `rolling` (
>>>    ...>     `source1`,
>>>    ...>     `source2`,
>>>    ...>     `ts`
>>>    ...> );
>>> Run Time: real 0.001 user 0.000000 sys 0.000000
>>> sqlite>
>>> sqlite> INSERT INTO rolling
>>>    ...>     WITH RECURSIVE
>>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL
>VALUES("bbb")
>>> ),
>>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL
>VALUES("X2")
>>> UNION ALL
>>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
>>>    ...>       cnt( ts, value) AS (
>>>    ...>       VALUES( 0, "ZZZZ")
>>>    ...>         UNION ALL
>>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
>>>    ...>
>>>    ...>     select src1.source1, src2.source2, cnt.* from src1,
>src2, cnt;
>>> Run Time: real 8.920 user 8.843750 sys 0.078125
>>> sqlite>
>>> sqlite> analyze;
>>> Run Time: real 1.285 user 1.281250 sys 0.000000
>>> sqlite> .eqp full
>>> sqlite>
>>> sqlite> select source1,
>>>    ...>        source2,
>>>    ...>        (
>>>    ...>         select min(ts)
>>>    ...>           from rolling
>>>    ...>          where source1 = x.source1
>>>    ...>            and source2 = x.source2
>>>    ...>        )
>>>    ...>   from (
>>>    ...>         select distinct source1,
>>>    ...>                         source2
>>>    ...>           from rolling
>>>    ...>        ) as x;
>>> QUERY PLAN
>>> |--CO-ROUTINE 2
>>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320
>rows)
>>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
>>> `--CORRELATED SCALAR SUBQUERY 1
>>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?
>AND
>>> source2=?) (~983040 rows)
>>> addr  opcode         p1    p2    p3    p4             p5  comment
>>> ----  -------------  ----  ----  ----  -------------  --  --------
>-----
>>> 0     Init           0     64    0                    00  Start at
>64
>>> 1     InitCoroutine  1     23    2                    00  x
>>> 2     Null           1     4     0                    08
>r[4]=NULL
>>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3
>iDb=0;
>>> sources
>>> 4     ColumnsUsed    4     0     0     3              00
>>> 5     Explain        5     0     0     SCAN TABLE rolling USING
>COVERING
>>> INDEX sources (~7864320 rows)  00
>>> 6     Noop           0     0     0                    00  Begin
>>> WHERE-loop0: rolling
>>> 7     Rewind         4     21    2     0              00
>>> 8         Noop           0     0     0                    00
>Begin
>>> WHERE-core
>>> 9         Column         4     0     2                    00
>>> r[2]=rolling.source1
>>> 10        Column         4     1     3                    00
>>> r[3]=rolling.source2
>>> 11        Ne             2     13    4     (BINARY)       80  if
>>> r[4]!=r[2] goto 13
>>> 12        Eq             3     20    5     (BINARY)       80  if
>>> r[5]==r[3] goto 20
>>> 13        Copy           2     4     1                    00
>>> r[4..5]=r[2..3]
>>> 14        Yield          1     0     0                    00
>>> 15        Noop           0     0     0                    00  End
>>> WHERE-core
>>> 16        Column         4     0     6                    00
>r[6]=
>>> 17        Column         4     1     7                    00
>r[7]=
>>> 18        SeekGT         4     21    6     2              00
>key=r[6..7]
>>> 19      Goto           1     8     0                    00
>>> 20    Next           4     8     0                    01
>>> 21    Noop           0     0     0                    00  End
>>> WHERE-loop0: rolling
>>> 22    EndCoroutine   1     0     0                    00
>>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x
>(~7864320
>>> rows)  00
>>> 24    Noop           0     0     0                    00  Begin
>>> WHERE-loop0: x
>>> 25    InitCoroutine  1     0     2                    00
>>> 26      Yield          1     62    0                    00  next
>row of x
>>> 27      Noop           0     0     0                    00  Begin
>>> WHERE-core
>>> 28      Copy           2     9     0                    00
>r[9]=r[2];
>>> x.source1
>>> 29      Copy           3     10    0                    00
>r[10]=r[3];
>>> x.source2
>>> 30      Null           0     12    12                   00
>>> r[12..12]=NULL; Init subquery result
>>> 31      Integer        1     13    0                    00
>r[13]=1;
>>> LIMIT counter
>>> 32      Null           0     14    15                   00
>r[14..15]=NULL
>>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3
>iDb=0;
>>> sources
>>> 34      ColumnsUsed    5     0     0     7              00
>>> 35      Explain        35    0     0     SEARCH TABLE rolling
>USING
>>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)
>00
>>> 36      Noop           0     0     0                    00  Begin
>>> WHERE-loop0: rolling
>>> 37      Copy           2     16    0                    00
>r[16]=r[2];
>>> x.source1
>>> 38      Copy           3     17    0                    00
>r[17]=r[3];
>>> x.source2
>>> 39      CursorHint     5     0     0
>AND(EQ(c0,r[16]),EQ(c1,r[17]))
>>> 00
>>> 40      Copy           2     18    0                    00
>r[18]=r[2];
>>> x.source1
>>> 41      IsNull         18    54    0                    00  if
>>> r[18]==NULL goto 54
>>> 42      Copy           3     19    0                    00
>r[19]=r[3];
>>> x.source2
>>> 43      IsNull         19    54    0                    00  if
>>> r[19]==NULL goto 54
>>> 44      Null           0     20    0                    00
>r[20]=NULL
>>> 45      SeekGT         5     54    18    3              00
>key=r[18..20]
>>> 46        IdxGT          5     54    18    2              00
>>> key=r[18..19]
>>> 47        Noop           0     0     0                    00
>Begin
>>> WHERE-core
>>> 48        Column         5     2     21                   00
>>> r[21]=rolling.ts
>>> 49        CollSeq        0     0     0     (BINARY)       00
>>> 50        AggStep        0     21    14    min(1)         01
>accum=r[14]
>>> step(r[21])
>>> 51        Goto           0     55    0                    00
>min() by
>>> index
>>> 52        Noop           0     0     0                    00  End
>>> WHERE-core
>>> 53      Next           5     46    0                    00
>>> 54      Noop           0     0     0                    00  End
>>> WHERE-loop0: rolling
>>> 55      AggFinal       14    1     0     min(1)         00
>accum=r[14]
>>> N=1
>>> 56      Copy           14    12    0                    00
>r[12]=r[14]
>>> 57      DecrJumpZero   13    58    0                    00  if
>>> (--r[13])==0 goto 58
>>> 58      Copy           12    11    0                    00
>r[11]=r[12]
>>> 59      ResultRow      9     3     0                    00
>>> output=r[9..11]
>>> 60      Noop           0     0     0                    00  End
>WHERE-core
>>> 61    Goto           0     26    0                    00
>>> 62    Noop           0     0     0                    00  End
>>> WHERE-loop0: x
>>> 63    Halt           0     0     0                    00
>>> 64    Transaction    0     0     3     0              01
>>> usesStmtJournal=0
>>> 65    Goto           0     1     0                    00
>>> aaa|X1|0
>>> aaa|X2|0
>>> aaa|X3|0
>>> aaa|X4|0
>>> bbb|X1|0
>>> bbb|X2|0
>>> bbb|X3|0
>>> bbb|X4|0
>>> Run Time: real 0.134 user 0.000000 sys 0.000000
>>> sqlite>
>>>
>>> ---
>>> The fact that there's a Highway to Hell but only a Stairway to
>Heaven
>>> says a lot about anticipated traffic volume.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [hidden email]
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>users
>>>
>>
>_______________________________________________
>sqlite-users mailing list
>[hidden email]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
In reply to this post by Simon Slavin-3
Simon, Tim,

Il sab 2 feb 2019, 23:40 Simon Slavin <[hidden email]> ha scritto:

> On 2 Feb 2019, at 10:19pm, Gerlando Falauto <[hidden email]>
> wrote:
>
> > Results should be naturally ordered by source1, source2,ts.
>
> [Sorry, I missed this the first time.  Thanks, Tim.]
>
> Sorry, no.  You're making assumptions about how SQLite works internally.
> If you want your results sorted, ask for them sorted.  If you don't, don't.
>

I do want them sorted, and I also want the whole (huge) dataset to be
processable without having to store it all in memory or temp files.
Sounds like the whole purpose of an index, doesn't it?
I do know SQL is all about the result, not how it's obtained, though.


> Note that if you don't ask for them sorted and they come out sorted by
> good luck, there's no guarantee that they'll be sorted if you put different
> data in the table, or the same data with the rows in a different order, or
> if you update to a later version of SQLite.
>

I do realize that, that's why I'm not happy with that seems to do the trick
without an explicit order by clause.

Thank you for your help!

Gerlando


>
> Simon.
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
In reply to this post by Keith Medcalf
Thanks Keith, I'll give it a go and let you know!
I still don't get how that differs from
2) or 4) below, though.

Thanks again!
Gerlando

Il dom 3 feb 2019, 00:27 Keith Medcalf <[hidden email]> ha scritto:

>
> Like this?
>
> SELECT rolling.source1,
>        rolling.source2,
>        ts,
>        value
>   FROM (
>         select distinct source1,
>                         source2
>           from rolling
>          where source1 = 'aaa'
>        ) as x
>   JOIN rolling
>     ON rolling.source1 = x.source1
>    AND rolling.source2 = x.source2
>  WHERE ts > 1
>    AND ts < 10
> ORDER BY 1,2,3;
>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >[hidden email]] On Behalf Of Gerlando Falauto
> >Sent: Saturday, 2 February, 2019 15:20
> >To: SQLite mailing list
> >Subject: Re: [sqlite] Min/Max and skip-scan optimizations
> >
> >Hi,
> >it's me again, struggling with indices once again.
> >What I'm now trying to do is filter on source1 and by range of
> >timestamp.
> >Results should be naturally ordered by source1, source2,ts.
> >
> >1)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >AND ts > 1 AND ts < 10;
> >
> >QUERY PLAN
> >`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >
> >This looks pretty OK on the sample dataset.
> >For some reason though this doesn't really work like that with the
> >real
> >dataset (which has an extra index on ts only),
> >and that seems to kick in instead of the real index.
> >So the query plan comes out completely different, and rows are not
> >naturally sorted by source1, source2, as I'd like.
> >[I know I can use "+ts" and/or drop the "ts" index altogether, but
> >I'm
> >trying to make a point here...]
> >So I add an extra ORDER BY clause:
> >
> >2)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 100000000
> >ORDER BY source1, source2, ts;
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
> >
> >Here I don't really understand why a TEMP B-TREE is required at all.
> >Apparently, this adds extra processing at the end so I don't start
> >pulling
> >any rows until much later.
> >If the index is used, data would be *naturally* sorted by its key, so
> >I
> >don't really understand.
> >So I recur to a subquery:
> >
> >3)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >`--LIST SUBQUERY
> >   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >
> >This also seems to yield the right results, in the right order.
> >But the order looks pretty much like a coincidence to me.
> >So I'd rather explicitly state the sorting I'd like:
> >
> >4)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >ORDER BY source1, source2, ts
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >|--LIST SUBQUERY
> >|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >`--USE TEMP B-TREE FOR ORDER BY
> >
> >Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
> >some
> >global sorting
> >which looks even worse than case 2.
> >
> >What am I doing wrong? Is this expected?
> >I just don't seem to be able to get what would have been a pretty
> >trivial
> >task with
> >last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
> >
> >Thanks in advance!
> >Gerlando
> >
> >
> >On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
> ><[hidden email]>
> >wrote:
> >
> >> YES! Thank you!
> >> Many thanks for the ".eqp full" tip also, that really explains a
> >lot
> >> (though I don't really understand any of it yet).
> >>
> >> Have a great day!
> >> Gerlando
> >>
> >>
> >> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[hidden email]>
> >wrote:
> >>
> >>>
> >>> Do you perhaps want this:
> >>>
> >>> select source1,
> >>>        source2,
> >>>        (
> >>>         select min(ts)
> >>>           from rolling
> >>>          where source1 = x.source1
> >>>            and source2 = x.source2
> >>>        )
> >>>   from (
> >>>         select distinct source1,
> >>>                         source2
> >>>           from rolling
> >>>        ) as x;
> >>>
> >>> SQLite version 3.27.0 2019-01-28 00:42:06
> >>> Enter ".help" for usage hints.
> >>> Connected to a transient in-memory database.
> >>> Use ".open FILENAME" to reopen on a persistent database.
> >>> sqlite> .timer on
> >>> sqlite> CREATE TABLE `rolling` (
> >>>    ...>     `source1`    TEXT NOT NULL,
> >>>    ...>     `source2`    TEXT NOT NULL,
> >>>    ...>     `ts`    INTEGER NOT NULL,
> >>>    ...>     `value`    TEXT
> >>>    ...> );
> >>> Run Time: real 0.002 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> CREATE INDEX `sources` ON `rolling` (
> >>>    ...>     `source1`,
> >>>    ...>     `source2`,
> >>>    ...>     `ts`
> >>>    ...> );
> >>> Run Time: real 0.001 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> INSERT INTO rolling
> >>>    ...>     WITH RECURSIVE
> >>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL
> >VALUES("bbb")
> >>> ),
> >>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL
> >VALUES("X2")
> >>> UNION ALL
> >>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
> >>>    ...>       cnt( ts, value) AS (
> >>>    ...>       VALUES( 0, "ZZZZ")
> >>>    ...>         UNION ALL
> >>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
> >>>    ...>
> >>>    ...>     select src1.source1, src2.source2, cnt.* from src1,
> >src2, cnt;
> >>> Run Time: real 8.920 user 8.843750 sys 0.078125
> >>> sqlite>
> >>> sqlite> analyze;
> >>> Run Time: real 1.285 user 1.281250 sys 0.000000
> >>> sqlite> .eqp full
> >>> sqlite>
> >>> sqlite> select source1,
> >>>    ...>        source2,
> >>>    ...>        (
> >>>    ...>         select min(ts)
> >>>    ...>           from rolling
> >>>    ...>          where source1 = x.source1
> >>>    ...>            and source2 = x.source2
> >>>    ...>        )
> >>>    ...>   from (
> >>>    ...>         select distinct source1,
> >>>    ...>                         source2
> >>>    ...>           from rolling
> >>>    ...>        ) as x;
> >>> QUERY PLAN
> >>> |--CO-ROUTINE 2
> >>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320
> >rows)
> >>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
> >>> `--CORRELATED SCALAR SUBQUERY 1
> >>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?
> >AND
> >>> source2=?) (~983040 rows)
> >>> addr  opcode         p1    p2    p3    p4             p5  comment
> >>> ----  -------------  ----  ----  ----  -------------  --  --------
> >-----
> >>> 0     Init           0     64    0                    00  Start at
> >64
> >>> 1     InitCoroutine  1     23    2                    00  x
> >>> 2     Null           1     4     0                    08
> >r[4]=NULL
> >>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 4     ColumnsUsed    4     0     0     3              00
> >>> 5     Explain        5     0     0     SCAN TABLE rolling USING
> >COVERING
> >>> INDEX sources (~7864320 rows)  00
> >>> 6     Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 7     Rewind         4     21    2     0              00
> >>> 8         Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 9         Column         4     0     2                    00
> >>> r[2]=rolling.source1
> >>> 10        Column         4     1     3                    00
> >>> r[3]=rolling.source2
> >>> 11        Ne             2     13    4     (BINARY)       80  if
> >>> r[4]!=r[2] goto 13
> >>> 12        Eq             3     20    5     (BINARY)       80  if
> >>> r[5]==r[3] goto 20
> >>> 13        Copy           2     4     1                    00
> >>> r[4..5]=r[2..3]
> >>> 14        Yield          1     0     0                    00
> >>> 15        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 16        Column         4     0     6                    00
> >r[6]=
> >>> 17        Column         4     1     7                    00
> >r[7]=
> >>> 18        SeekGT         4     21    6     2              00
> >key=r[6..7]
> >>> 19      Goto           1     8     0                    00
> >>> 20    Next           4     8     0                    01
> >>> 21    Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 22    EndCoroutine   1     0     0                    00
> >>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x
> >(~7864320
> >>> rows)  00
> >>> 24    Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: x
> >>> 25    InitCoroutine  1     0     2                    00
> >>> 26      Yield          1     62    0                    00  next
> >row of x
> >>> 27      Noop           0     0     0                    00  Begin
> >>> WHERE-core
> >>> 28      Copy           2     9     0                    00
> >r[9]=r[2];
> >>> x.source1
> >>> 29      Copy           3     10    0                    00
> >r[10]=r[3];
> >>> x.source2
> >>> 30      Null           0     12    12                   00
> >>> r[12..12]=NULL; Init subquery result
> >>> 31      Integer        1     13    0                    00
> >r[13]=1;
> >>> LIMIT counter
> >>> 32      Null           0     14    15                   00
> >r[14..15]=NULL
> >>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 34      ColumnsUsed    5     0     0     7              00
> >>> 35      Explain        35    0     0     SEARCH TABLE rolling
> >USING
> >>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)
> >00
> >>> 36      Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 37      Copy           2     16    0                    00
> >r[16]=r[2];
> >>> x.source1
> >>> 38      Copy           3     17    0                    00
> >r[17]=r[3];
> >>> x.source2
> >>> 39      CursorHint     5     0     0
> >AND(EQ(c0,r[16]),EQ(c1,r[17]))
> >>> 00
> >>> 40      Copy           2     18    0                    00
> >r[18]=r[2];
> >>> x.source1
> >>> 41      IsNull         18    54    0                    00  if
> >>> r[18]==NULL goto 54
> >>> 42      Copy           3     19    0                    00
> >r[19]=r[3];
> >>> x.source2
> >>> 43      IsNull         19    54    0                    00  if
> >>> r[19]==NULL goto 54
> >>> 44      Null           0     20    0                    00
> >r[20]=NULL
> >>> 45      SeekGT         5     54    18    3              00
> >key=r[18..20]
> >>> 46        IdxGT          5     54    18    2              00
> >>> key=r[18..19]
> >>> 47        Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 48        Column         5     2     21                   00
> >>> r[21]=rolling.ts
> >>> 49        CollSeq        0     0     0     (BINARY)       00
> >>> 50        AggStep        0     21    14    min(1)         01
> >accum=r[14]
> >>> step(r[21])
> >>> 51        Goto           0     55    0                    00
> >min() by
> >>> index
> >>> 52        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 53      Next           5     46    0                    00
> >>> 54      Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 55      AggFinal       14    1     0     min(1)         00
> >accum=r[14]
> >>> N=1
> >>> 56      Copy           14    12    0                    00
> >r[12]=r[14]
> >>> 57      DecrJumpZero   13    58    0                    00  if
> >>> (--r[13])==0 goto 58
> >>> 58      Copy           12    11    0                    00
> >r[11]=r[12]
> >>> 59      ResultRow      9     3     0                    00
> >>> output=r[9..11]
> >>> 60      Noop           0     0     0                    00  End
> >WHERE-core
> >>> 61    Goto           0     26    0                    00
> >>> 62    Noop           0     0     0                    00  End
> >>> WHERE-loop0: x
> >>> 63    Halt           0     0     0                    00
> >>> 64    Transaction    0     0     3     0              01
> >>> usesStmtJournal=0
> >>> 65    Goto           0     1     0                    00
> >>> aaa|X1|0
> >>> aaa|X2|0
> >>> aaa|X3|0
> >>> aaa|X4|0
> >>> bbb|X1|0
> >>> bbb|X2|0
> >>> bbb|X3|0
> >>> bbb|X4|0
> >>> Run Time: real 0.134 user 0.000000 sys 0.000000
> >>> sqlite>
> >>>
> >>> ---
> >>> The fact that there's a Highway to Hell but only a Stairway to
> >Heaven
> >>> says a lot about anticipated traffic volume.
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> [hidden email]
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> >users
> >>>
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >[hidden email]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
In reply to this post by Gerlando Falauto
On 3 Feb 2019, at 9:52am, Gerlando Falauto <[hidden email]> wrote:

> I do want them sorted, and I also want the whole (huge) dataset to be
> processable without having to store it all in memory or temp files.
> Sounds like the whole purpose of an index, doesn't it?
> I do know SQL is all about the result, not how it's obtained, though.

Absolutely.  You should be fine.

Try using the procedure in my previous message (the one which mentions ANALYZE) to find out the most useful index.

Once you've found that, you can delete the other index.  Then you've solved the problem and can move on.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
In reply to this post by Keith Medcalf
Hi Keith,

here's what I get as a query plan for your query:

5)
SELECT rolling.source1,
       rolling.source2,
       ts,
       value
  FROM (
        select distinct source1,
                        source2
          from rolling
         where source1 = 'aaa'
       ) as x
  JOIN rolling
    ON rolling.source1 = x.source1
   AND rolling.source2 = x.source2
 WHERE ts > 1
   AND ts < 100000000
ORDER BY 1,2,3;

QUERY PLAN
|--MATERIALIZE 1
|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)
`--SEARCH SUBQUERY 1 AS x USING AUTOMATIC COVERING INDEX (source2=? AND
source1=?)
Run Time: real 40.343 user 39.380000 sys 0.710000

This looks only "slightly" slower than the optimum:

1)
SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
AND ts > 1 AND ts < 100000000;

QUERY PLAN
`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts<?)
Run Time: real 29.435 user 28.030000 sys 0.880000

What bothers me most is the MATERIALIZE step, which I believe should be
related to the SELECT DISTINCT step.
If that's really the case, I know for a fact this will only yield a very
limited number of results, so that may not be that much of an issue.
Is my understanding correct?
Yet I don't understand why this materializing step does apply to 5) and not
3), while the documentation seems to suggest quite the opposite.

IMHO, adding the ORDER BY clause to query 1) above (i.e. query 2) should
ideally yield the exact same query plan.
In the end adding an ORDER BY clause on the exact same columns of the index
used to traverse the table, should be easily recognizable.
Knowing absolutely nothing about the internals though, I have no idea
whether this particular use case has been overlooked, or it would just be
unfeasible to handle it.

Thank you again everyone!
Gerlando

On Sun, Feb 3, 2019 at 12:27 AM Keith Medcalf <[hidden email]> wrote:

>
> Like this?
>
> SELECT rolling.source1,
>        rolling.source2,
>        ts,
>        value
>   FROM (
>         select distinct source1,
>                         source2
>           from rolling
>          where source1 = 'aaa'
>        ) as x
>   JOIN rolling
>     ON rolling.source1 = x.source1
>    AND rolling.source2 = x.source2
>  WHERE ts > 1
>    AND ts < 10
> ORDER BY 1,2,3;
>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-----Original Message-----
> >From: sqlite-users [mailto:sqlite-users-
> >[hidden email]] On Behalf Of Gerlando Falauto
> >Sent: Saturday, 2 February, 2019 15:20
> >To: SQLite mailing list
> >Subject: Re: [sqlite] Min/Max and skip-scan optimizations
> >
> >Hi,
> >it's me again, struggling with indices once again.
> >What I'm now trying to do is filter on source1 and by range of
> >timestamp.
> >Results should be naturally ordered by source1, source2,ts.
> >
> >1)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >AND ts > 1 AND ts < 10;
> >
> >QUERY PLAN
> >`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >
> >This looks pretty OK on the sample dataset.
> >For some reason though this doesn't really work like that with the
> >real
> >dataset (which has an extra index on ts only),
> >and that seems to kick in instead of the real index.
> >So the query plan comes out completely different, and rows are not
> >naturally sorted by source1, source2, as I'd like.
> >[I know I can use "+ts" and/or drop the "ts" index altogether, but
> >I'm
> >trying to make a point here...]
> >So I add an extra ORDER BY clause:
> >
> >2)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 100000000
> >ORDER BY source1, source2, ts;
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts<?)
> >`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
> >
> >Here I don't really understand why a TEMP B-TREE is required at all.
> >Apparently, this adds extra processing at the end so I don't start
> >pulling
> >any rows until much later.
> >If the index is used, data would be *naturally* sorted by its key, so
> >I
> >don't really understand.
> >So I recur to a subquery:
> >
> >3)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >`--LIST SUBQUERY
> >   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >
> >This also seems to yield the right results, in the right order.
> >But the order looks pretty much like a coincidence to me.
> >So I'd rather explicitly state the sorting I'd like:
> >
> >4)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >        (SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >ORDER BY source1, source2, ts
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts<?)
> >|--LIST SUBQUERY
> >|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >`--USE TEMP B-TREE FOR ORDER BY
> >
> >Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
> >some
> >global sorting
> >which looks even worse than case 2.
> >
> >What am I doing wrong? Is this expected?
> >I just don't seem to be able to get what would have been a pretty
> >trivial
> >task with
> >last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
> >
> >Thanks in advance!
> >Gerlando
> >
> >
> >On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
> ><[hidden email]>
> >wrote:
> >
> >> YES! Thank you!
> >> Many thanks for the ".eqp full" tip also, that really explains a
> >lot
> >> (though I don't really understand any of it yet).
> >>
> >> Have a great day!
> >> Gerlando
> >>
> >>
> >> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf <[hidden email]>
> >wrote:
> >>
> >>>
> >>> Do you perhaps want this:
> >>>
> >>> select source1,
> >>>        source2,
> >>>        (
> >>>         select min(ts)
> >>>           from rolling
> >>>          where source1 = x.source1
> >>>            and source2 = x.source2
> >>>        )
> >>>   from (
> >>>         select distinct source1,
> >>>                         source2
> >>>           from rolling
> >>>        ) as x;
> >>>
> >>> SQLite version 3.27.0 2019-01-28 00:42:06
> >>> Enter ".help" for usage hints.
> >>> Connected to a transient in-memory database.
> >>> Use ".open FILENAME" to reopen on a persistent database.
> >>> sqlite> .timer on
> >>> sqlite> CREATE TABLE `rolling` (
> >>>    ...>     `source1`    TEXT NOT NULL,
> >>>    ...>     `source2`    TEXT NOT NULL,
> >>>    ...>     `ts`    INTEGER NOT NULL,
> >>>    ...>     `value`    TEXT
> >>>    ...> );
> >>> Run Time: real 0.002 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> CREATE INDEX `sources` ON `rolling` (
> >>>    ...>     `source1`,
> >>>    ...>     `source2`,
> >>>    ...>     `ts`
> >>>    ...> );
> >>> Run Time: real 0.001 user 0.000000 sys 0.000000
> >>> sqlite>
> >>> sqlite> INSERT INTO rolling
> >>>    ...>     WITH RECURSIVE
> >>>    ...>       src1( source1 ) AS ( VALUES("aaa") UNION ALL
> >VALUES("bbb")
> >>> ),
> >>>    ...>       src2( source2 ) AS ( VALUES("X1") UNION ALL
> >VALUES("X2")
> >>> UNION ALL
> >>>    ...> VALUES("X3") UNION ALL VALUES("X4") ),
> >>>    ...>       cnt( ts, value) AS (
> >>>    ...>       VALUES( 0, "ZZZZ")
> >>>    ...>         UNION ALL
> >>>    ...>       SELECT ts+1, value FROM cnt LIMIT 1000000)
> >>>    ...>
> >>>    ...>     select src1.source1, src2.source2, cnt.* from src1,
> >src2, cnt;
> >>> Run Time: real 8.920 user 8.843750 sys 0.078125
> >>> sqlite>
> >>> sqlite> analyze;
> >>> Run Time: real 1.285 user 1.281250 sys 0.000000
> >>> sqlite> .eqp full
> >>> sqlite>
> >>> sqlite> select source1,
> >>>    ...>        source2,
> >>>    ...>        (
> >>>    ...>         select min(ts)
> >>>    ...>           from rolling
> >>>    ...>          where source1 = x.source1
> >>>    ...>            and source2 = x.source2
> >>>    ...>        )
> >>>    ...>   from (
> >>>    ...>         select distinct source1,
> >>>    ...>                         source2
> >>>    ...>           from rolling
> >>>    ...>        ) as x;
> >>> QUERY PLAN
> >>> |--CO-ROUTINE 2
> >>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320
> >rows)
> >>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
> >>> `--CORRELATED SCALAR SUBQUERY 1
> >>>    `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?
> >AND
> >>> source2=?) (~983040 rows)
> >>> addr  opcode         p1    p2    p3    p4             p5  comment
> >>> ----  -------------  ----  ----  ----  -------------  --  --------
> >-----
> >>> 0     Init           0     64    0                    00  Start at
> >64
> >>> 1     InitCoroutine  1     23    2                    00  x
> >>> 2     Null           1     4     0                    08
> >r[4]=NULL
> >>> 3     OpenRead       4     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 4     ColumnsUsed    4     0     0     3              00
> >>> 5     Explain        5     0     0     SCAN TABLE rolling USING
> >COVERING
> >>> INDEX sources (~7864320 rows)  00
> >>> 6     Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 7     Rewind         4     21    2     0              00
> >>> 8         Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 9         Column         4     0     2                    00
> >>> r[2]=rolling.source1
> >>> 10        Column         4     1     3                    00
> >>> r[3]=rolling.source2
> >>> 11        Ne             2     13    4     (BINARY)       80  if
> >>> r[4]!=r[2] goto 13
> >>> 12        Eq             3     20    5     (BINARY)       80  if
> >>> r[5]==r[3] goto 20
> >>> 13        Copy           2     4     1                    00
> >>> r[4..5]=r[2..3]
> >>> 14        Yield          1     0     0                    00
> >>> 15        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 16        Column         4     0     6                    00
> >r[6]=
> >>> 17        Column         4     1     7                    00
> >r[7]=
> >>> 18        SeekGT         4     21    6     2              00
> >key=r[6..7]
> >>> 19      Goto           1     8     0                    00
> >>> 20    Next           4     8     0                    01
> >>> 21    Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 22    EndCoroutine   1     0     0                    00
> >>> 23    Explain        23    0     0     SCAN SUBQUERY 2 AS x
> >(~7864320
> >>> rows)  00
> >>> 24    Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: x
> >>> 25    InitCoroutine  1     0     2                    00
> >>> 26      Yield          1     62    0                    00  next
> >row of x
> >>> 27      Noop           0     0     0                    00  Begin
> >>> WHERE-core
> >>> 28      Copy           2     9     0                    00
> >r[9]=r[2];
> >>> x.source1
> >>> 29      Copy           3     10    0                    00
> >r[10]=r[3];
> >>> x.source2
> >>> 30      Null           0     12    12                   00
> >>> r[12..12]=NULL; Init subquery result
> >>> 31      Integer        1     13    0                    00
> >r[13]=1;
> >>> LIMIT counter
> >>> 32      Null           0     14    15                   00
> >r[14..15]=NULL
> >>> 33      OpenRead       5     3     0     k(4,,,,)       00  root=3
> >iDb=0;
> >>> sources
> >>> 34      ColumnsUsed    5     0     0     7              00
> >>> 35      Explain        35    0     0     SEARCH TABLE rolling
> >USING
> >>> COVERING INDEX sources (source1=? AND source2=?) (~983040 rows)
> >00
> >>> 36      Noop           0     0     0                    00  Begin
> >>> WHERE-loop0: rolling
> >>> 37      Copy           2     16    0                    00
> >r[16]=r[2];
> >>> x.source1
> >>> 38      Copy           3     17    0                    00
> >r[17]=r[3];
> >>> x.source2
> >>> 39      CursorHint     5     0     0
> >AND(EQ(c0,r[16]),EQ(c1,r[17]))
> >>> 00
> >>> 40      Copy           2     18    0                    00
> >r[18]=r[2];
> >>> x.source1
> >>> 41      IsNull         18    54    0                    00  if
> >>> r[18]==NULL goto 54
> >>> 42      Copy           3     19    0                    00
> >r[19]=r[3];
> >>> x.source2
> >>> 43      IsNull         19    54    0                    00  if
> >>> r[19]==NULL goto 54
> >>> 44      Null           0     20    0                    00
> >r[20]=NULL
> >>> 45      SeekGT         5     54    18    3              00
> >key=r[18..20]
> >>> 46        IdxGT          5     54    18    2              00
> >>> key=r[18..19]
> >>> 47        Noop           0     0     0                    00
> >Begin
> >>> WHERE-core
> >>> 48        Column         5     2     21                   00
> >>> r[21]=rolling.ts
> >>> 49        CollSeq        0     0     0     (BINARY)       00
> >>> 50        AggStep        0     21    14    min(1)         01
> >accum=r[14]
> >>> step(r[21])
> >>> 51        Goto           0     55    0                    00
> >min() by
> >>> index
> >>> 52        Noop           0     0     0                    00  End
> >>> WHERE-core
> >>> 53      Next           5     46    0                    00
> >>> 54      Noop           0     0     0                    00  End
> >>> WHERE-loop0: rolling
> >>> 55      AggFinal       14    1     0     min(1)         00
> >accum=r[14]
> >>> N=1
> >>> 56      Copy           14    12    0                    00
> >r[12]=r[14]
> >>> 57      DecrJumpZero   13    58    0                    00  if
> >>> (--r[13])==0 goto 58
> >>> 58      Copy           12    11    0                    00
> >r[11]=r[12]
> >>> 59      ResultRow      9     3     0                    00
> >>> output=r[9..11]
> >>> 60      Noop           0     0     0                    00  End
> >WHERE-core
> >>> 61    Goto           0     26    0                    00
> >>> 62    Noop           0     0     0                    00  End
> >>> WHERE-loop0: x
> >>> 63    Halt           0     0     0                    00
> >>> 64    Transaction    0     0     3     0              01
> >>> usesStmtJournal=0
> >>> 65    Goto           0     1     0                    00
> >>> aaa|X1|0
> >>> aaa|X2|0
> >>> aaa|X3|0
> >>> aaa|X4|0
> >>> bbb|X1|0
> >>> bbb|X2|0
> >>> bbb|X3|0
> >>> bbb|X4|0
> >>> Run Time: real 0.134 user 0.000000 sys 0.000000
> >>> sqlite>
> >>>
> >>> ---
> >>> The fact that there's a Highway to Hell but only a Stairway to
> >Heaven
> >>> says a lot about anticipated traffic volume.
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> [hidden email]
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> >users
> >>>
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >[hidden email]
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Luuk

On 3-2-2019 23:29, Gerlando Falauto wrote:
> IMHO, adding the ORDER BY clause to query 1) above (i.e. query 2) should
> ideally yield the exact same query plan.
> In the end adding an ORDER BY clause on the exact same columns of the index
> used to traverse the table, should be easily recognizable.
> Knowing absolutely nothing about the internals though, I have no idea
> whether this particular use case has been overlooked, or it would just be
> unfeasible to handle it.


In SQL, when doing a SELECT, the order of the results is undetermined
(by definition).

If you want/need to results to be ORDERed, you need to add 'ORDER BY'.
This will always show as an extra step in your QUERY PLAN.

One can never know (for sure) if the output of this is in the correct order:

CREATE TABLE test(i primary key);
INSERT INTO test values(4);
INSERT INTO test values(2);
INSERT INTO test values(3);
INSERT INTO test values(1);
SELECT i FROM test;

4
2
3
1

To 'know' it is in the correct order one has to define an ORDER BY to
specify in which order the data should be returned

SELECT i FROM test order by 2*i+i%2;

1
2
3
4



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
Thank you Luuk, I understand your point.
However, the query plan already takes advantage of the index and should be
retrieving data in that order.
Reading the docs
https://www.sqlite.org/optoverview.html#order_by_optimizations my
understanding was that
SQLite would be taking advantage of that.
So perhaps my use case it's too complicated (many columns -- some filtered,
some not -- skip/scan, all together) to make it obvious to the query
planner that data *is* already sorted.
Or maybe it never occurred to anyone that someone might be trying to do
something like that.
Or (most likely) my understanding of how data is retrieved is plain wrong...

Thank you!
Gerlando


On Mon, Feb 4, 2019 at 1:26 PM Luuk <[hidden email]> wrote:

>
> On 3-2-2019 23:29, Gerlando Falauto wrote:
> > IMHO, adding the ORDER BY clause to query 1) above (i.e. query 2) should
> > ideally yield the exact same query plan.
> > In the end adding an ORDER BY clause on the exact same columns of the
> index
> > used to traverse the table, should be easily recognizable.
> > Knowing absolutely nothing about the internals though, I have no idea
> > whether this particular use case has been overlooked, or it would just be
> > unfeasible to handle it.
>
>
> In SQL, when doing a SELECT, the order of the results is undetermined
> (by definition).
>
> If you want/need to results to be ORDERed, you need to add 'ORDER BY'.
> This will always show as an extra step in your QUERY PLAN.
>
> One can never know (for sure) if the output of this is in the correct
> order:
>
> CREATE TABLE test(i primary key);
> INSERT INTO test values(4);
> INSERT INTO test values(2);
> INSERT INTO test values(3);
> INSERT INTO test values(1);
> SELECT i FROM test;
>
> 4
> 2
> 3
> 1
>
> To 'know' it is in the correct order one has to define an ORDER BY to
> specify in which order the data should be returned
>
> SELECT i FROM test order by 2*i+i%2;
>
> 1
> 2
> 3
> 4
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [hidden email]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Luuk

On 4-2-2019 14:55, Gerlando Falauto wrote:

> Thank you Luuk, I understand your point.
> However, the query plan already takes advantage of the index and should be
> retrieving data in that order.
> Reading the docs
> https://www.sqlite.org/optoverview.html#order_by_optimizations my
> understanding was that
> SQLite would be taking advantage of that.
> So perhaps my use case it's too complicated (many columns -- some filtered,
> some not -- skip/scan, all together) to make it obvious to the query
> planner that data *is* already sorted.
> Or maybe it never occurred to anyone that someone might be trying to do
> something like that.
> Or (most likely) my understanding of how data is retrieved is plain wrong...
>
> Thank you!
> Gerlando
>
>
It says:

SQLite *attempts* to use an index to satisfy the ORDER BY clause of a
query when possible


To be (abolutely!) SURE results are in the correct order, you need an
ORDER BY.

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Simon Slavin-3
In reply to this post by Gerlando Falauto
On 4 Feb 2019, at 1:55pm, Gerlando Falauto <[hidden email]> wrote:

> Or (most likely) my understanding of how data is retrieved is plain wrong...

Or your understanding how the current version of SQLite is correct, but a later version of SQLite will have different optimizations and do things differently.  So at some point you'll update your libraries and suddenly things won't work any more.  So if you depend on sorting, future-proof your code by asking for sorting.

By the way, here's an example of a SQL engine (not SQLite) not using an index when you though it would.  Suppose you have a short table …just 40 rows:

CREATE TABLE MyTable (a INTEGER, b TEXT);
CREATE UNIQUE INDEX MT_a ON MyTable (a);
INSERT <40 rows of data into MyTable>

SELECT a,b FROM MyTable ORDER BY a;

The assumed plan would be to use the index to retrieve the row order, then to look up each retrieved row in the table to retrieve the value for b.  This requires one index walk plus 40 table lookups.

But the engine knows that 40 table lookups takes a long time.  It would be faster to read the table, then sort it internally.  It's a table with only 40 rows, so sorting it would be fast and take only a little memory.  That saves 40 lookups.

So even though there's an index, it's not a covering index (it doesn't contain all the data needed) so it won't be used.

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
In reply to this post by Luuk
Hi Luuk,

It says:
>
> SQLite *attempts* to use an index to satisfy the ORDER BY clause of a
> query when possible
>
>
> To be (abolutely!) SURE results are in the correct order, you need an
> ORDER BY.
>

No questioning about that. ORDER BY *must* be there in order to get the
results correctly sorted.

What I'm saying is it shouldn't add any extra overhead, which apparently
does in my case.

Notice how ORDER BY does not add any in all the trivial cases:

sqlite> explain query plan select * from rolling order by
source1,source2,ts;
QUERY PLAN
`--SCAN TABLE rolling USING INDEX sources
sqlite> explain query plan select * from rolling order by
source1,source2;
QUERY PLAN
`--SCAN TABLE rolling USING INDEX sources
sqlite> explain query plan select * from rolling order by source1;
QUERY PLAN
`--SCAN TABLE rolling USING INDEX sources
sqlite> explain query plan select source1,source2,ts from rolling order by
source1,source2,ts;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
sqlite> explain query plan select source1,source2,ts from rolling order by
source1,source2;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
sqlite> explain query plan select source1,source2,ts from rolling order by
source1;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources

It's just that when things get a bit more complicated, I start getting
surprising results (for my use case -- which, I admit, is unusual at least).

Any suggestion welcome.
Thanks,
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Gerlando Falauto
In reply to this post by Simon Slavin-3
On Mon, Feb 4, 2019 at 4:52 PM Simon Slavin <[hidden email]> wrote:

> On 4 Feb 2019, at 1:55pm, Gerlando Falauto <[hidden email]>
> wrote:
>
> > Or (most likely) my understanding of how data is retrieved is plain
> wrong...
>
> Or your understanding how the current version of SQLite is correct, but a
> later version of SQLite will have different optimizations and do things
> differently.  So at some point you'll update your libraries and suddenly
> things won't work any more.  So if you depend on sorting, future-proof your
> code by asking for sorting.
>

As I just wrote in replty to Luuk's comment, I'm not questioning the usage
of ORDER BY. It should be there. I just would expect it to add ZERO
overhead.
But again, perhaps I'm making the wrong assumption that using the index
data would be "already sorted", perhaps it isn't.
I wonder if I'd be allowed to add an ORDER BY in the subquery and if that
would make any difference -- I remember reading ORDER BY is only allowed in
the outer query (which makes perfect sense).


> By the way, here's an example of a SQL engine (not SQLite) not using an
> index when you though it would.  Suppose you have a short table …just 40
> rows:
>
> CREATE TABLE MyTable (a INTEGER, b TEXT);
> CREATE UNIQUE INDEX MT_a ON MyTable (a);
> INSERT <40 rows of data into MyTable>
>
> SELECT a,b FROM MyTable ORDER BY a;
>
> The assumed plan would be to use the index to retrieve the row order, then
> to look up each retrieved row in the table to retrieve the value for b.
> This requires one index walk plus 40 table lookups.
>
> But the engine knows that 40 table lookups takes a long time.  It would be
> faster to read the table, then sort it internally.  It's a table with only
> 40 rows, so sorting it would be fast and take only a little memory.  That
> saves 40 lookups.
>
> So even though there's an index, it's not a covering index (it doesn't
> contain all the data needed) so it won't be used.
>

That's understandable and I was expecting that. That's why I populated the
test dataset with *muuuch* more data -- to avoid corner cases like this.

Thanks again for your patience ;-)
Gerlando
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

Keith Medcalf

>I wonder if I'd be allowed to add an ORDER BY in the subquery and if
>that would make any difference -- I remember reading ORDER BY is only
>allowed in the outer query (which makes perfect sense).

Yes, you can use an order by in a subquery (either a correlated subquery or a table generating subquery).  And you can use one in the outer query which will be used to order the results.  Whether the order-by clause in a non-correlated subquery has any meaning depends on how the query planner decides to perform the query.  The order-by in a table generating subquery (that is not correlated) may be "pushed" to the outer query just as a where clause in the subquery may also get pushed to the outer query as well if the query is flattened or if doing so results in a more optimum plan, or so I have observed ...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: Min/Max and skip-scan optimizations

James K. Lowden
In reply to this post by Gerlando Falauto
On Mon, 4 Feb 2019 18:55:33 +0100
Gerlando Falauto <[hidden email]> wrote:

> I remember reading ORDER BY is only allowed in
> the outer query

As Keith said, SQLite allows ORDER BY in subqueries.  The SQL standard
does not.  

Logically, ORDER BY makes sense only for the outer query.  An SQL
SELECT statement decribes a set of results, possibly presented in a
particular order. An "internal ORDER BY" describes neither selection
criteria nor presentation order.  

Technically, ORDER BY takes a tabular result as input and produces a
cursor as output.  It need not sort the results; all that's required is
that the cursor return successive rows in the prescribed order.  

SQLite extends ORDER BY with LIMIT.  Because the combination affects
more than just the order, it can be useful to use ORDER BY in a
subquery.  Now that window functions provide a (more convenient)
standard way to produce row numbers, LIMIT is a bit of anachronism but,
for reasons of backwards compatibility, is unlikely to be removed.  

--jkl
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12